What does GEO mean in website analytics?

GEO usually means Generative Engine Optimisation: improving and measuring how visible your content is inside AI answer engines and AI-assisted search tools such as ChatGPT, Perplexity, Claude and Gemini.

Why does GA4 miss AI crawlers?

GA4 mostly relies on browser-side JavaScript. Many AI crawlers and fetchers request the HTML directly, do not run your analytics tag, do not accept cookies and do not behave like normal browser sessions. They can touch your site without appearing in GA4.

What should I track to measure AI visibility?

Track three streams separately: server logs for AI crawler and fetcher requests, GA4 for human referrals from AI tools, and Google Search Console for traditional Google search visibility. Mixing them into one number is misleading.

Which AI bot user agents should I look for?

Start with ChatGPT-User, GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, anthropic-ai, CCBot, Bytespider and Applebot-Extended. Keep the list under review because bot names and behaviours change.

What server logs can tell you about AI visitors

We have been running a small experiment on the Squared Lemons website.

Nothing grand. No claim that we have solved Generative Engine Optimisation. No dashboard pretending to know exactly how ChatGPT, Claude or Perplexity decide what to cite.

Just a practical question:

If AI systems are reading the site, where would the evidence actually show up?

The obvious answer is Google Analytics.

The obvious answer is wrong, or at least incomplete.

GA4 is useful, but it mostly sees a certain kind of visitor: someone using a browser that loads your page, runs JavaScript, accepts enough tracking to be counted, and is not blocked by privacy tools.

AI crawlers and AI fetchers often do not behave like that.

They may request the page directly. They may read the HTML. They may not run your analytics tag. They may not look like a normal website session at all.

So we started looking somewhere less fashionable and more revealing: the server logs.

Server logs are not glamorous. They are plain records of HTTP requests: timestamp, path, status code, user agent, IP address, bytes transferred.

But if you want to understand whether AI systems are touching your website, those plain records may be the closest thing you have to evidence.

GEO is not one metric

GEO usually means Generative Engine Optimisation: making your content easier for answer engines and AI-assisted search tools to find, understand and use.

It is a useful phrase, but it is also early, messy and easy to overstate.

There is no settled industry standard for measuring it. There is no single dashboard that says: “ChatGPT understands your website 17% better this week.”

Anyone pretending otherwise is selling confidence they do not have.

That is why we are treating this as discovery work: gather the signals, separate the evidence, watch the trend, and be honest about what each number can and cannot prove.

For now, the practical approach is to separate the evidence into three streams.

1. Traditional search visibility

This is what Google Search Console shows you:

queries
impressions
clicks
click-through rate
average position
indexed pages
crawl issues

This still matters. It is not going away.

If someone searches Google and clicks your site, Search Console and GA4 can usually tell part of that story.

But Search Console is mostly about Google search. It does not give you a clean view of whether Claude, Perplexity or ChatGPT have fetched your content.

2. Human referrals from AI tools

This is when someone is using ChatGPT, Perplexity, Claude, Gemini or Copilot, receives an answer, sees your site referenced, and clicks through.

GA4 may show this as referral traffic, for example:

chatgpt.com / referral
perplexity.ai / referral
claude.ai / referral
gemini.google.com / referral

This is useful. It tells you when an AI product has sent a human visitor.

But it does not tell you whether the AI system crawled your site before that happened.

3. AI crawler and fetcher activity

This is the hidden layer.

AI systems, search systems and answer engines may request pages directly from your server. They may check your sitemap. They may read your articles. They may inspect your robots.txt.

That activity is usually visible in server logs, not GA4.

The user agents to look for include names such as:

ChatGPT-User
GPTBot
OAI-SearchBot
ClaudeBot
PerplexityBot
Google-Extended
anthropic-ai
CCBot
Bytespider
Applebot-Extended

Some are crawlers. Some are fetchers. Some are related to search or training. Some are triggered by a user asking a live question.

They should not all be treated as the same thing.

But if they are touching your site, you probably want to know.

Why GA4 misses this

GA4 is client-side analytics.

In plain English: it mostly works because a browser loads a JavaScript tag and sends data back to Google.

Many bots do not do that.

They request the page directly. They may read the HTML. They may not execute JavaScript. They may not load your analytics scripts. They may not accept cookies. They may not behave like a human browser at all.

So GA4 can undercount or completely miss bot activity.

That does not mean GA4 is bad. It means it is measuring a different thing.

GA4 is good for user behaviour.

Server logs are better for request evidence.

Search Console is better for Google search visibility.

You need all three if you want a serious view of search and generative visibility.

What server logs can tell you

A typical HTTP access log can help answer questions like:

Did ClaudeBot visit the site?
Did ChatGPT fetch a specific article?
Did PerplexityBot crawl the sitemap?
Which pages are AI bots reading?
Are bots hitting old URLs and getting 404s?
Are they seeing redirects?
Are they being served 200 responses?
Are they wasting crawl attention on tag pages, image files or outdated URLs?
Did bot activity change after a site update?
Did AI referral traffic rise after a particular article was crawled?

That last question is where this becomes commercially useful.

This is the experiment we are running now: not “can we make a number go up this week?”, but “can we see which parts of the site are being discovered, requested and revisited by search and AI systems?”

The point is not to collect logs for the sake of logs.

The point is to build a measurement loop.

For example:

Publish or update an article.
Check whether Googlebot, ClaudeBot, ChatGPT-User or PerplexityBot request it.
Check whether the page starts getting impressions in Search Console.
Check whether GA4 starts showing referrals from ChatGPT or Perplexity.
Compare this against enquiries, email clicks or other commercial actions.

That gives you a way to test whether site changes are actually improving discoverability.

What to log

At minimum, keep the following fields:

timestamp
host
path
HTTP method
status code
user agent
referrer, where available
deployment or release identifier, if your platform exposes it
response time, if available
bytes transferred
source IP, treated carefully for privacy

For GEO and search visibility, the most useful fields are usually:

path
status
user agent
timestamp
host

If a bot requests /articles/your-best-ai-guide/ and receives a 200, that is useful.

If it requests /old-page and receives a 404, that is also useful, but for a different reason.

One tells you your content is discoverable.

The other tells you your site structure may be wasting crawler attention.

What to track

For a small business site, I would not start with a complex dashboard.

I would start with a simple weekly or daily table.

AI crawler and fetcher activity

Track:

bot name
number of requests
pages requested
successful responses
errors
top pages
new pages discovered

Useful bot families:

Bot or agent	What it may indicate
`ClaudeBot`	Anthropic/Claude crawler activity
`ChatGPT-User`	ChatGPT fetching a page, often user-triggered
`GPTBot`	OpenAI crawler activity
`OAI-SearchBot`	OpenAI search/indexing activity
`PerplexityBot`	Perplexity crawler activity
`Google-Extended`	Google AI-related crawler control token
`CCBot`	Common Crawl activity
`Bytespider`	ByteDance crawler
`Applebot-Extended`	Apple AI-related crawler token

Search crawler activity

Track:

Googlebot
bingbot

This helps you understand whether traditional search engines are seeing the same pages as AI systems.

Track:

facebookexternalhit
meta-externalagent
LinkedIn and other preview bots if relevant

This is not GEO in the strict sense, but it helps explain traffic spikes from social sharing.

GA4 AI referrals

Track referral sessions from:

ChatGPT
Perplexity
Claude
Gemini
Copilot
Poe
You.com
Phind

This tells you when AI tools are sending humans to the site.

Search Console metrics

Track:

clicks
impressions
CTR
average position
top queries
top pages

This tells you whether Google search visibility is improving.

The mistake: treating bot traffic as visits

An AI crawler request is not the same as a human visit.

This matters.

If ClaudeBot requests your sitemap, that does not mean someone has read your article.

If ChatGPT-User fetches an article, that is stronger evidence that a user asked something and ChatGPT needed your page. But even then, it is not the same as a normal website session.

So do not inflate your traffic numbers with bots.

Instead, classify the evidence properly:

Evidence	What it proves	What it does not prove
Server log shows `ClaudeBot`	Claude crawler requested the site	A human read the content
Server log shows `ChatGPT-User`	ChatGPT fetched a page	The user clicked through
GA4 shows `chatgpt.com / referral`	A human session arrived from ChatGPT	ChatGPT crawled the site
Search Console shows impressions	Google showed the page in search	The page is used by AI answers

This distinction keeps the reporting honest.

It also makes the trends more useful.

Privacy and compliance

Server logs can contain personal data.

In the UK and EU, IP addresses and user agents can count as personal data, especially when combined with timestamps and paths.

That does not mean you cannot use logs. It means you need to be sensible.

Practical steps:

keep logs only as long as needed
aggregate where possible
avoid joining logs to named customer records unless there is a clear reason
restrict access
document the purpose: security, diagnostics, search visibility and site optimisation
avoid publishing raw IP addresses in reports
consider truncating or hashing IP addresses if you are storing long-term analysis data

For most SME marketing analysis, you do not need raw IPs in the final report.

You need counts, bot names, paths, status codes and trends.

How to implement this without overbuilding it

You do not need an enterprise observability platform to start.

A practical setup looks like this.

Step 1: Get access to server or platform logs

Depending on the hosting platform, this may be:

Nginx or Apache access logs
CDN logs
Railway HTTP logs
Cloudflare logs
Vercel logs
Netlify logs

The important point is that they must include user agents.

Step 2: Define the bot patterns

Create a small list of user-agent patterns to monitor:

ClaudeBot
ChatGPT-User
OAI-SearchBot
GPTBot
PerplexityBot
Google-Extended
anthropic-ai
CCBot
Bytespider
Applebot-Extended
Googlebot
bingbot
facebookexternalhit
meta-externalagent

Expect this list to change.

The AI crawler ecosystem is moving quickly.

Step 3: Store a daily summary

A simple Google Sheet is enough at the start.

Suggested tabs:

Daily Summary
AI Bot Requests
Crawler Summary by Bot
GA4 AI Referrals
GSC Search Baseline
Bot Pattern Config

This lets you see trends without needing a full data warehouse.

Step 4: Compare against site changes

This is where the value is.

Do not just ask:

Did we get more traffic?

Ask:

After we changed the site, did crawlers and AI systems start reading the pages we care about?

For example:

add robots.txt
add llms.txt
clean up sitemap/canonical issues
improve article summaries
add structured data
rewrite service pages to answer direct buyer questions

Then watch:

did AI bots request the new files?
did they reach the priority pages?
did Search Console impressions change?
did AI referral sessions change?
did enquiries change?

That is a useful feedback loop.

What good looks like

At this stage, good does not mean lots of bot traffic.

Lots of bot traffic can be waste.

Good looks like:

important pages requested successfully
fewer crawler hits to broken or legacy URLs
sitemap and canonical URLs being used
article pages being fetched by relevant AI/search agents
GA4 showing occasional AI referral sessions
Search Console impressions rising for relevant queries
enquiries or useful conversations increasing over time

The early aim is not certainty. It is direction.

If we change the site and the right pages become easier for search engines and AI systems to find, that should start to show up somewhere: in the logs, in Search Console, in AI referrals, or eventually in better conversations with prospects.

For a small business, that is enough to begin with.

You are not trying to reverse-engineer every AI model.

You are trying to understand whether your content is becoming more discoverable in the places people now ask questions.

The practical conclusion

This is not a finished playbook. It is an early measurement loop.

If you only look at GA4, you will miss part of the picture.

If you only look at server logs, you will confuse bot requests with human interest.

If you only look at Search Console, you will understand Google but miss the wider shift toward answer engines.

The practical answer is to combine all three.

Use:

server logs for AI crawler and fetch evidence
GA4 for human referral sessions from AI tools
Google Search Console for classic search visibility

Then track the trend over time.

That gives you a way to test whether GEO work is making a real difference, rather than just adding another marketing acronym to the pile.

What server logs can tell you about AI visitors

What server logs can tell you about AI visitors

GEO is not one metric

1. Traditional search visibility

2. Human referrals from AI tools

3. AI crawler and fetcher activity

Why GA4 misses this

What server logs can tell you

What to log

What to track

AI crawler and fetcher activity

Search crawler activity

GA4 AI referrals

Search Console metrics

The mistake: treating bot traffic as visits

Privacy and compliance

How to implement this without overbuilding it

Step 1: Get access to server or platform logs

Step 2: Define the bot patterns

Step 3: Store a daily summary

Step 4: Compare against site changes

What good looks like

The practical conclusion

Sources

Frequently asked questions

What does GEO mean in website analytics?

Why does GA4 miss AI crawlers?

What should I track to measure AI visibility?

Which AI bot user agents should I look for?

What server logs can tell you about AI visitors

GEO is not one metric

1. Traditional search visibility

2. Human referrals from AI tools

3. AI crawler and fetcher activity

Why GA4 misses this

What server logs can tell you

What to log

What to track

AI crawler and fetcher activity

Search crawler activity

Social crawler activity

GA4 AI referrals

Search Console metrics

The mistake: treating bot traffic as visits

Privacy and compliance

How to implement this without overbuilding it

Step 1: Get access to server or platform logs

Step 2: Define the bot patterns

Step 3: Store a daily summary

Step 4: Compare against site changes

What good looks like

The practical conclusion

Sources

Frequently asked questions

What does GEO mean in website analytics?

Why does GA4 miss AI crawlers?

What should I track to measure AI visibility?

Which AI bot user agents should I look for?

Want more like this?