Tech SEO in GEO – Why Your Technical Setup Is Becoming Even More Important incl. Checklist

Published on 28.01.2026

Aline Kunz

Co-Head SEO & UX

Online search is changing rapidly. With the integration of large language models (LLMs) into search engines, tech SEO is taking on a new dimension. We call it GEO – Generative Engine Optimisation. Others also refer to it as LLMO (Large Language Model Optimisation), AEO (Answer Engine Optimisation), AIO (AI Optimisation), GEO (Generative Experience Optimisation) or SEO for AI. It is no longer just about Google and Bing understanding your content, but also about AI systems being able to find, evaluate and correctly interpret it. A checklist with the most important technical foundations and tips on how you can review them can be found at the end of this article.

How Web Crawlers See a Website

Web crawlers (such as Googlebot) search the web by following links and downloading relevant documents. Traditionally, a web crawler approaches a website in a two-step process.

Crawling (HTML phase): The crawler downloads the HTML source code delivered by the server. This captures static elements such as text, links and basic metadata.
Rendering (JavaScript phase): If JavaScript is necessary to make the main content visible, the crawler must render the page in a browser. This is resource-intensive and time-consuming.

Source: Google Developers – JavaScript SEO Basics
https://developers.google.com/search/docs/crawling-indexing/javascript/javascript-seo-basics

What Can Go Wrong During Crawling

If the main content is solely loaded via JavaScript or important resources (CSS, JS, images) are blocked by robots.txt, there is a risk that content will only be crawled with a delay or that the crawler will only capture a fraction of the page. Content that is not directly available in HTML may thus be indexed incompletely or not at all. The result: poorer visibility and inefficient use of the crawling budget.

Slow page load times exacerbate this problem. The longer a page takes to load, the longer the crawler is busy with that page and the fewer resources it has left to crawl other content. This means that fewer pages are crawled or that updates are detected less frequently.

Another risk is duplicate content: poorly managed URL parameters, session IDs or different versions of content can cause the crawler to process identical content multiple times. This wastes resources and can weaken the relevance of the original page.

How to Control Crawling and Indexing in a Targeted Manner

The crawling budget is a limited resource. The aim is to direct it efficiently to the most important content.

Here Is How You Control Crawling Efficiently:

Robots.txt: This is the gateway to your website. Here you can specify which areas should not be crawled, such as admin directories or script folders. This prevents bots from wasting time on irrelevant pages. Please note: this is not a guarantee against indexing; it only prevents crawling.
XML sitemap: This is your priority list of content to be crawled. It shows crawlers which URLs are important and, ideally, when they were last updated.
Internal linking: Logical, clean linking guides crawlers through your website and signals which content is particularly relevant. Broken links or old redirects disrupt the flow and waste crawl budget.

Here is How You Control What Gets Indexed:

Noindex tag: Exclude pages that offer no added value for the search, such as irrelevant filter pages, login areas or pages with internal search results.
Canonical tag: Prevent duplicate content by specifying the preferred URL for pages with identical content. This allows you to bundle authority and link power.
Indexing APIs: An indexing API can be used for particularly dynamic content, such as job vacancies or live streams. This allows new or updated pages to be indexed almost in real time.

How AI Crawlers Work and Make Your Website Visible

With the advent of generative AI, new crawlers have appeared on the scene. In addition to Googlebot and Bingbot, GPTBot, ClaudeBot, PerplexityBot or the ChatGPT-Users are now also searching the web. They collect information for training data or use it to generate answers in real time.

These AI crawlers work differently. They are less interested in visual presentation and instead focus on structured, easily understandable text. They evaluate content based on informational value and reliability – not on classic ranking signals. According to SEO experts such as Barry Adams, a clearly structured and semantically correct page has clear advantages in the context of AI crawlers.

Since many of these bots cannot render complex JavaScript structures, static or server-side delivered content is another advantage. Anyone who loads important information using JavaScript client-side risks never making it into the knowledge base of AI models. Tech SEO is therefore no longer just the basis for Google, but for the entire AI ecosystem.

Figure: Process flow of generative search from the Webrepublic “AI in Search” presentation, 2025.

Why Log File Analysis Is Gaining Importance Again

The truth is in the log files – today more than ever before. Log file analysis shows which crawlers actually access your pages, how often they do so and where they encounter obstacles.

While classic tools such as Google Search Console do not provide insight into the activity of AI bots, and tools such as a Search Console for ChatGPT or similar do not exist, log files show in black and white which content is accessed by GPTBot, PerplexityBot or ClaudeBot. This allows you to see whether your important pages are visible to LLMs or whether technical hurdles are preventing crawling. As a study by Cloudflare shows, the number of visits from AI crawlers such as GPTBot or OpenAI’s ChatGPT user, but also those from PerplexityBot, has recently increased significantly.

Log files can also be used to check whether the crawl budget is being used efficiently. If certain bots repeatedly access the same pages or encounter timeouts, this is a sign that optimisation is needed. Log files are therefore a valuable tool for checking the effectiveness of your GEO measures in the age of AI – currently one of the few reliable ones, in fact.

Ensure That the Website Structure Is Clear and Understandable

A clean and semantically correct website structure is the best insurance against misunderstandings by any type of crawler – whether traditional or from AI platforms.

Good Title Structure (HTML Title Structure):

Headings (H1, H2, H3, etc.) should reflect the clear, hierarchical structure of the content. The H1 tag concisely summarises the main content of the page. This helps both LLMs and traditional crawlers to correctly grasp the core message and thematic depth.

Good Semantic HTML Structure (HTML5 Structure):

Use semantic HTML5 tags (e.g. <main>, <article>, <nav>, <aside>) correctly. This allows crawlers to distinguish important content (main content) from boilerplate (header, footer, navigation).

No Important Content Hidden by JavaScript:

Important text content must be available directly in the HTML source code (in the initial HTML). If the content is loadedclient-side, rendering must be error-free and fast. If your page is very JavaScript-heavy, server-side rendering (SSR), dynamic or pre-rendering should ideally be used.

Why Page Speed Is Important

Page speed is more than just a ranking factor. It influences the user experience and the efficiency with which crawlers – both traditional and AI-based – extract and evaluate data.

Traditionally: Slow pages consume more crawling budget and worsen Core Web Vitals. This leads to less frequent crawling and can have a negative impact on ranking.
In the LLM world: AI crawlers are resource-oriented. A sluggish website is expensive and inefficient for them. They abort the loading process or visit such pages less frequently. The result: important content may not be captured at all and is missing from the model’s knowledge base.

In short: a fast website is not only important for users today, but also for the visibility of your content in generative search.

Structured Data: Context for Humans and Machines

Structured data helps search engines and AI systems understand content more accurately. It provides context about people, products, reviews or events, thereby translating content into machine-readable facts.

It is not yet entirely clear how important structured data – such as schema markups – are for LLMs and whether they are actively taken into account. However, current analyses show that structured data has a positive influence on visibility in Google’s AI Overviews. Google itself also emphasises that structured data is important for performance in the AI functions in Google Search. It is therefore a clear signal of quality and consistency – something that AI systems are also increasingly valuing.

AI Crawler Management: Should I Set Up an LLMs.txt File?

The idea of an llms.txt file keeps cropping up – a file that helps LLMs analyse the most important information on an entire website quickly and efficiently. However, it is not currently an official standard and is not supported by major AI platforms such as ChatGPT, Perplexity and Claude.

The robots.txt file, sitemaps and good internal linking are and remain the most proven methods for guiding search engine crawlers and AI bots alike to the relevant content on your website in a targeted and efficient manner.

If you want to be visible in LLMs, you should also make sure that relevant AI crawlers are not accidentally blocked by firewalls and content delivery networks (CDNs). Some firewalls and CDN providers, such as Cloudflare, exclude AI bots from crawling by default. You may need to whitelist relevant AI bots or actively allow crawling by AI crawlers. If you do not want certain LLMs to use your content for training purposes, you can block those bots specifically. However, this also reduces visibility in AI-powered search engines.

As long as there are no official new protocols or standards, the following applies: monitor developments, check log files, firewall and CDN settings, and use robots.txt, sitemaps and internal links strategically. Transparency and control remain the basis.

Conclusion: No Visibility in AI Search Without Tech SEO

The rules of search are being rewritten. LLMs are changing how content is discovered, evaluated and reproduced. In this new search landscape, a stable, fast and clean technical foundation is the basis for visibility.

Investing in tech SEO today ensures that both classic search engines and AI systems can correctly read, understand, classify and reproduce content. This is not a nice-to-have – it is a prerequisite for remaining visible in AI-powered search.

If you want to ensure that your technical setup is fit for the future, we would be happy to support you in a non-binding initial consultation. You can get an initial overview in our checklist, which you can find directly below this section.

Download the Tech SEO Checklist here

Download PDF

Interested in discussing your setup?

Latest insights

15.12.2025

SEO Relaunch: 6 Steps to Success incl. Checklist

18.06.2025