/ 10 min read / Jonathan Gill

What server logs can tell you about AI visitors

A practical experiment in measuring whether AI systems are fetching your website, why GA4 misses most of it, and how to combine server logs with Search Console and analytics.

geo generative-engine-optimisation ai-search analytics server-logs
What server logs can tell you about AI visitors

What server logs can tell you about AI visitors

We have been running a small experiment on the Squared Lemons website.

Nothing grand. No claim that we have solved Generative Engine Optimisation. No dashboard pretending to know exactly how ChatGPT, Claude or Perplexity decide what to cite.

Just a practical question:

If AI systems are reading the site, where would the evidence actually show up?

The obvious answer is Google Analytics.

The obvious answer is wrong, or at least incomplete.

GA4 is useful, but it mostly sees a certain kind of visitor: someone using a browser that loads your page, runs JavaScript, accepts enough tracking to be counted, and is not blocked by privacy tools.

AI crawlers and AI fetchers often do not behave like that.

They may request the page directly. They may read the HTML. They may not run your analytics tag. They may not look like a normal website session at all.

So we started looking somewhere less fashionable and more revealing: the server logs.

Server logs are not glamorous. They are plain records of HTTP requests: timestamp, path, status code, user agent, IP address, bytes transferred.

But if you want to understand whether AI systems are touching your website, those plain records may be the closest thing you have to evidence.

GEO is not one metric

GEO usually means Generative Engine Optimisation: making your content easier for answer engines and AI-assisted search tools to find, understand and use.

It is a useful phrase, but it is also early, messy and easy to overstate.

There is no settled industry standard for measuring it. There is no single dashboard that says: “ChatGPT understands your website 17% better this week.”

Anyone pretending otherwise is selling confidence they do not have.

That is why we are treating this as discovery work: gather the signals, separate the evidence, watch the trend, and be honest about what each number can and cannot prove.

For now, the practical approach is to separate the evidence into three streams.

1. Traditional search visibility

This is what Google Search Console shows you:

  • queries
  • impressions
  • clicks
  • click-through rate
  • average position
  • indexed pages
  • crawl issues

This still matters. It is not going away.

If someone searches Google and clicks your site, Search Console and GA4 can usually tell part of that story.

But Search Console is mostly about Google search. It does not give you a clean view of whether Claude, Perplexity or ChatGPT have fetched your content.

2. Human referrals from AI tools

This is when someone is using ChatGPT, Perplexity, Claude, Gemini or Copilot, receives an answer, sees your site referenced, and clicks through.

GA4 may show this as referral traffic, for example:

  • chatgpt.com / referral
  • perplexity.ai / referral
  • claude.ai / referral
  • gemini.google.com / referral

This is useful. It tells you when an AI product has sent a human visitor.

But it does not tell you whether the AI system crawled your site before that happened.

3. AI crawler and fetcher activity

This is the hidden layer.

AI systems, search systems and answer engines may request pages directly from your server. They may check your sitemap. They may read your articles. They may inspect your robots.txt.

That activity is usually visible in server logs, not GA4.

The user agents to look for include names such as:

  • ChatGPT-User
  • GPTBot
  • OAI-SearchBot
  • ClaudeBot
  • PerplexityBot
  • Google-Extended
  • anthropic-ai
  • CCBot
  • Bytespider
  • Applebot-Extended

Some are crawlers. Some are fetchers. Some are related to search or training. Some are triggered by a user asking a live question.

They should not all be treated as the same thing.

But if they are touching your site, you probably want to know.

Why GA4 misses this

GA4 is client-side analytics.

In plain English: it mostly works because a browser loads a JavaScript tag and sends data back to Google.

Many bots do not do that.

They request the page directly. They may read the HTML. They may not execute JavaScript. They may not load your analytics scripts. They may not accept cookies. They may not behave like a human browser at all.

So GA4 can undercount or completely miss bot activity.

That does not mean GA4 is bad. It means it is measuring a different thing.

GA4 is good for user behaviour.

Server logs are better for request evidence.

Search Console is better for Google search visibility.

You need all three if you want a serious view of search and generative visibility.

What server logs can tell you

A typical HTTP access log can help answer questions like:

  • Did ClaudeBot visit the site?
  • Did ChatGPT fetch a specific article?
  • Did PerplexityBot crawl the sitemap?
  • Which pages are AI bots reading?
  • Are bots hitting old URLs and getting 404s?
  • Are they seeing redirects?
  • Are they being served 200 responses?
  • Are they wasting crawl attention on tag pages, image files or outdated URLs?
  • Did bot activity change after a site update?
  • Did AI referral traffic rise after a particular article was crawled?

That last question is where this becomes commercially useful.

This is the experiment we are running now: not “can we make a number go up this week?”, but “can we see which parts of the site are being discovered, requested and revisited by search and AI systems?”

The point is not to collect logs for the sake of logs.

The point is to build a measurement loop.

For example:

  1. Publish or update an article.
  2. Check whether Googlebot, ClaudeBot, ChatGPT-User or PerplexityBot request it.
  3. Check whether the page starts getting impressions in Search Console.
  4. Check whether GA4 starts showing referrals from ChatGPT or Perplexity.
  5. Compare this against enquiries, email clicks or other commercial actions.

That gives you a way to test whether site changes are actually improving discoverability.

What to log

At minimum, keep the following fields:

  • timestamp
  • host
  • path
  • HTTP method
  • status code
  • user agent
  • referrer, where available
  • deployment or release identifier, if your platform exposes it
  • response time, if available
  • bytes transferred
  • source IP, treated carefully for privacy

For GEO and search visibility, the most useful fields are usually:

  • path
  • status
  • user agent
  • timestamp
  • host

If a bot requests /articles/your-best-ai-guide/ and receives a 200, that is useful.

If it requests /old-page and receives a 404, that is also useful, but for a different reason.

One tells you your content is discoverable.

The other tells you your site structure may be wasting crawler attention.

What to track

For a small business site, I would not start with a complex dashboard.

I would start with a simple weekly or daily table.

AI crawler and fetcher activity

Track:

  • bot name
  • number of requests
  • pages requested
  • successful responses
  • errors
  • top pages
  • new pages discovered

Useful bot families:

Bot or agentWhat it may indicate
ClaudeBotAnthropic/Claude crawler activity
ChatGPT-UserChatGPT fetching a page, often user-triggered
GPTBotOpenAI crawler activity
OAI-SearchBotOpenAI search/indexing activity
PerplexityBotPerplexity crawler activity
Google-ExtendedGoogle AI-related crawler control token
CCBotCommon Crawl activity
BytespiderByteDance crawler
Applebot-ExtendedApple AI-related crawler token

Search crawler activity

Track:

  • Googlebot
  • bingbot

This helps you understand whether traditional search engines are seeing the same pages as AI systems.

Social crawler activity

Track:

  • facebookexternalhit
  • meta-externalagent
  • LinkedIn and other preview bots if relevant

This is not GEO in the strict sense, but it helps explain traffic spikes from social sharing.

GA4 AI referrals

Track referral sessions from:

  • ChatGPT
  • Perplexity
  • Claude
  • Gemini
  • Copilot
  • Poe
  • You.com
  • Phind

This tells you when AI tools are sending humans to the site.

Search Console metrics

Track:

  • clicks
  • impressions
  • CTR
  • average position
  • top queries
  • top pages

This tells you whether Google search visibility is improving.

The mistake: treating bot traffic as visits

An AI crawler request is not the same as a human visit.

This matters.

If ClaudeBot requests your sitemap, that does not mean someone has read your article.

If ChatGPT-User fetches an article, that is stronger evidence that a user asked something and ChatGPT needed your page. But even then, it is not the same as a normal website session.

So do not inflate your traffic numbers with bots.

Instead, classify the evidence properly:

EvidenceWhat it provesWhat it does not prove
Server log shows ClaudeBotClaude crawler requested the siteA human read the content
Server log shows ChatGPT-UserChatGPT fetched a pageThe user clicked through
GA4 shows chatgpt.com / referralA human session arrived from ChatGPTChatGPT crawled the site
Search Console shows impressionsGoogle showed the page in searchThe page is used by AI answers

This distinction keeps the reporting honest.

It also makes the trends more useful.

Privacy and compliance

Server logs can contain personal data.

In the UK and EU, IP addresses and user agents can count as personal data, especially when combined with timestamps and paths.

That does not mean you cannot use logs. It means you need to be sensible.

Practical steps:

  • keep logs only as long as needed
  • aggregate where possible
  • avoid joining logs to named customer records unless there is a clear reason
  • restrict access
  • document the purpose: security, diagnostics, search visibility and site optimisation
  • avoid publishing raw IP addresses in reports
  • consider truncating or hashing IP addresses if you are storing long-term analysis data

For most SME marketing analysis, you do not need raw IPs in the final report.

You need counts, bot names, paths, status codes and trends.

How to implement this without overbuilding it

You do not need an enterprise observability platform to start.

A practical setup looks like this.

Step 1: Get access to server or platform logs

Depending on the hosting platform, this may be:

  • Nginx or Apache access logs
  • CDN logs
  • Railway HTTP logs
  • Cloudflare logs
  • Vercel logs
  • Netlify logs

The important point is that they must include user agents.

Step 2: Define the bot patterns

Create a small list of user-agent patterns to monitor:

ClaudeBot
ChatGPT-User
OAI-SearchBot
GPTBot
PerplexityBot
Google-Extended
anthropic-ai
CCBot
Bytespider
Applebot-Extended
Googlebot
bingbot
facebookexternalhit
meta-externalagent

Expect this list to change.

The AI crawler ecosystem is moving quickly.

Step 3: Store a daily summary

A simple Google Sheet is enough at the start.

Suggested tabs:

  • Daily Summary
  • AI Bot Requests
  • Crawler Summary by Bot
  • GA4 AI Referrals
  • GSC Search Baseline
  • Bot Pattern Config

This lets you see trends without needing a full data warehouse.

Step 4: Compare against site changes

This is where the value is.

Do not just ask:

Did we get more traffic?

Ask:

After we changed the site, did crawlers and AI systems start reading the pages we care about?

For example:

  • add robots.txt
  • add llms.txt
  • clean up sitemap/canonical issues
  • improve article summaries
  • add structured data
  • rewrite service pages to answer direct buyer questions

Then watch:

  • did AI bots request the new files?
  • did they reach the priority pages?
  • did Search Console impressions change?
  • did AI referral sessions change?
  • did enquiries change?

That is a useful feedback loop.

What good looks like

At this stage, good does not mean lots of bot traffic.

Lots of bot traffic can be waste.

Good looks like:

  • important pages requested successfully
  • fewer crawler hits to broken or legacy URLs
  • sitemap and canonical URLs being used
  • article pages being fetched by relevant AI/search agents
  • GA4 showing occasional AI referral sessions
  • Search Console impressions rising for relevant queries
  • enquiries or useful conversations increasing over time

The early aim is not certainty. It is direction.

If we change the site and the right pages become easier for search engines and AI systems to find, that should start to show up somewhere: in the logs, in Search Console, in AI referrals, or eventually in better conversations with prospects.

For a small business, that is enough to begin with.

You are not trying to reverse-engineer every AI model.

You are trying to understand whether your content is becoming more discoverable in the places people now ask questions.

The practical conclusion

This is not a finished playbook. It is an early measurement loop.

If you only look at GA4, you will miss part of the picture.

If you only look at server logs, you will confuse bot requests with human interest.

If you only look at Search Console, you will understand Google but miss the wider shift toward answer engines.

The practical answer is to combine all three.

Use:

  • server logs for AI crawler and fetch evidence
  • GA4 for human referral sessions from AI tools
  • Google Search Console for classic search visibility

Then track the trend over time.

That gives you a way to test whether GEO work is making a real difference, rather than just adding another marketing acronym to the pile.

Sources

FAQ

Frequently asked questions

01

What does GEO mean in website analytics?

GEO usually means Generative Engine Optimisation: improving and measuring how visible your content is inside AI answer engines and AI-assisted search tools such as ChatGPT, Perplexity, Claude and Gemini.

02

Why does GA4 miss AI crawlers?

GA4 mostly relies on browser-side JavaScript. Many AI crawlers and fetchers request the HTML directly, do not run your analytics tag, do not accept cookies and do not behave like normal browser sessions. They can touch your site without appearing in GA4.

03

What should I track to measure AI visibility?

Track three streams separately: server logs for AI crawler and fetcher requests, GA4 for human referrals from AI tools, and Google Search Console for traditional Google search visibility. Mixing them into one number is misleading.

04

Which AI bot user agents should I look for?

Start with ChatGPT-User, GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, anthropic-ai, CCBot, Bytespider and Applebot-Extended. Keep the list under review because bot names and behaviours change.