What server logs can tell you about AI visitors
A practical experiment in measuring whether AI systems are fetching your website, why GA4 misses most of it, and how to combine server logs with Search Console and analytics.
What server logs can tell you about AI visitors
We have been running a small experiment on the Squared Lemons website.
Nothing grand. No claim that we have solved Generative Engine Optimisation. No dashboard pretending to know exactly how ChatGPT, Claude or Perplexity decide what to cite.
Just a practical question:
If AI systems are reading the site, where would the evidence actually show up?
The obvious answer is Google Analytics.
The obvious answer is wrong, or at least incomplete.
GA4 is useful, but it mostly sees a certain kind of visitor: someone using a browser that loads your page, runs JavaScript, accepts enough tracking to be counted, and is not blocked by privacy tools.
AI crawlers and AI fetchers often do not behave like that.
They may request the page directly. They may read the HTML. They may not run your analytics tag. They may not look like a normal website session at all.
So we started looking somewhere less fashionable and more revealing: the server logs.
Server logs are not glamorous. They are plain records of HTTP requests: timestamp, path, status code, user agent, IP address, bytes transferred.
But if you want to understand whether AI systems are touching your website, those plain records may be the closest thing you have to evidence.
GEO is not one metric
GEO usually means Generative Engine Optimisation: making your content easier for answer engines and AI-assisted search tools to find, understand and use.
It is a useful phrase, but it is also early, messy and easy to overstate.
There is no settled industry standard for measuring it. There is no single dashboard that says: “ChatGPT understands your website 17% better this week.”
Anyone pretending otherwise is selling confidence they do not have.
That is why we are treating this as discovery work: gather the signals, separate the evidence, watch the trend, and be honest about what each number can and cannot prove.
For now, the practical approach is to separate the evidence into three streams.
1. Traditional search visibility
This is what Google Search Console shows you:
- queries
- impressions
- clicks
- click-through rate
- average position
- indexed pages
- crawl issues
This still matters. It is not going away.
If someone searches Google and clicks your site, Search Console and GA4 can usually tell part of that story.
But Search Console is mostly about Google search. It does not give you a clean view of whether Claude, Perplexity or ChatGPT have fetched your content.
2. Human referrals from AI tools
This is when someone is using ChatGPT, Perplexity, Claude, Gemini or Copilot, receives an answer, sees your site referenced, and clicks through.
GA4 may show this as referral traffic, for example:
chatgpt.com / referralperplexity.ai / referralclaude.ai / referralgemini.google.com / referral
This is useful. It tells you when an AI product has sent a human visitor.
But it does not tell you whether the AI system crawled your site before that happened.
3. AI crawler and fetcher activity
This is the hidden layer.
AI systems, search systems and answer engines may request pages directly from your server. They may check your sitemap. They may read your articles. They may inspect your robots.txt.
That activity is usually visible in server logs, not GA4.
The user agents to look for include names such as:
ChatGPT-UserGPTBotOAI-SearchBotClaudeBotPerplexityBotGoogle-Extendedanthropic-aiCCBotBytespiderApplebot-Extended
Some are crawlers. Some are fetchers. Some are related to search or training. Some are triggered by a user asking a live question.
They should not all be treated as the same thing.
But if they are touching your site, you probably want to know.
Why GA4 misses this
GA4 is client-side analytics.
In plain English: it mostly works because a browser loads a JavaScript tag and sends data back to Google.
Many bots do not do that.
They request the page directly. They may read the HTML. They may not execute JavaScript. They may not load your analytics scripts. They may not accept cookies. They may not behave like a human browser at all.
So GA4 can undercount or completely miss bot activity.
That does not mean GA4 is bad. It means it is measuring a different thing.
GA4 is good for user behaviour.
Server logs are better for request evidence.
Search Console is better for Google search visibility.
You need all three if you want a serious view of search and generative visibility.
What server logs can tell you
A typical HTTP access log can help answer questions like:
- Did ClaudeBot visit the site?
- Did ChatGPT fetch a specific article?
- Did PerplexityBot crawl the sitemap?
- Which pages are AI bots reading?
- Are bots hitting old URLs and getting 404s?
- Are they seeing redirects?
- Are they being served 200 responses?
- Are they wasting crawl attention on tag pages, image files or outdated URLs?
- Did bot activity change after a site update?
- Did AI referral traffic rise after a particular article was crawled?
That last question is where this becomes commercially useful.
This is the experiment we are running now: not “can we make a number go up this week?”, but “can we see which parts of the site are being discovered, requested and revisited by search and AI systems?”
The point is not to collect logs for the sake of logs.
The point is to build a measurement loop.
For example:
- Publish or update an article.
- Check whether Googlebot, ClaudeBot, ChatGPT-User or PerplexityBot request it.
- Check whether the page starts getting impressions in Search Console.
- Check whether GA4 starts showing referrals from ChatGPT or Perplexity.
- Compare this against enquiries, email clicks or other commercial actions.
That gives you a way to test whether site changes are actually improving discoverability.
What to log
At minimum, keep the following fields:
- timestamp
- host
- path
- HTTP method
- status code
- user agent
- referrer, where available
- deployment or release identifier, if your platform exposes it
- response time, if available
- bytes transferred
- source IP, treated carefully for privacy
For GEO and search visibility, the most useful fields are usually:
pathstatususer agenttimestamphost
If a bot requests /articles/your-best-ai-guide/ and receives a 200, that is useful.
If it requests /old-page and receives a 404, that is also useful, but for a different reason.
One tells you your content is discoverable.
The other tells you your site structure may be wasting crawler attention.
What to track
For a small business site, I would not start with a complex dashboard.
I would start with a simple weekly or daily table.
AI crawler and fetcher activity
Track:
- bot name
- number of requests
- pages requested
- successful responses
- errors
- top pages
- new pages discovered
Useful bot families:
| Bot or agent | What it may indicate |
|---|---|
ClaudeBot | Anthropic/Claude crawler activity |
ChatGPT-User | ChatGPT fetching a page, often user-triggered |
GPTBot | OpenAI crawler activity |
OAI-SearchBot | OpenAI search/indexing activity |
PerplexityBot | Perplexity crawler activity |
Google-Extended | Google AI-related crawler control token |
CCBot | Common Crawl activity |
Bytespider | ByteDance crawler |
Applebot-Extended | Apple AI-related crawler token |
Search crawler activity
Track:
Googlebotbingbot
This helps you understand whether traditional search engines are seeing the same pages as AI systems.
Social crawler activity
Track:
facebookexternalhitmeta-externalagent- LinkedIn and other preview bots if relevant
This is not GEO in the strict sense, but it helps explain traffic spikes from social sharing.
GA4 AI referrals
Track referral sessions from:
- ChatGPT
- Perplexity
- Claude
- Gemini
- Copilot
- Poe
- You.com
- Phind
This tells you when AI tools are sending humans to the site.
Search Console metrics
Track:
- clicks
- impressions
- CTR
- average position
- top queries
- top pages
This tells you whether Google search visibility is improving.
The mistake: treating bot traffic as visits
An AI crawler request is not the same as a human visit.
This matters.
If ClaudeBot requests your sitemap, that does not mean someone has read your article.
If ChatGPT-User fetches an article, that is stronger evidence that a user asked something and ChatGPT needed your page. But even then, it is not the same as a normal website session.
So do not inflate your traffic numbers with bots.
Instead, classify the evidence properly:
| Evidence | What it proves | What it does not prove |
|---|---|---|
Server log shows ClaudeBot | Claude crawler requested the site | A human read the content |
Server log shows ChatGPT-User | ChatGPT fetched a page | The user clicked through |
GA4 shows chatgpt.com / referral | A human session arrived from ChatGPT | ChatGPT crawled the site |
| Search Console shows impressions | Google showed the page in search | The page is used by AI answers |
This distinction keeps the reporting honest.
It also makes the trends more useful.
Privacy and compliance
Server logs can contain personal data.
In the UK and EU, IP addresses and user agents can count as personal data, especially when combined with timestamps and paths.
That does not mean you cannot use logs. It means you need to be sensible.
Practical steps:
- keep logs only as long as needed
- aggregate where possible
- avoid joining logs to named customer records unless there is a clear reason
- restrict access
- document the purpose: security, diagnostics, search visibility and site optimisation
- avoid publishing raw IP addresses in reports
- consider truncating or hashing IP addresses if you are storing long-term analysis data
For most SME marketing analysis, you do not need raw IPs in the final report.
You need counts, bot names, paths, status codes and trends.
How to implement this without overbuilding it
You do not need an enterprise observability platform to start.
A practical setup looks like this.
Step 1: Get access to server or platform logs
Depending on the hosting platform, this may be:
- Nginx or Apache access logs
- CDN logs
- Railway HTTP logs
- Cloudflare logs
- Vercel logs
- Netlify logs
The important point is that they must include user agents.
Step 2: Define the bot patterns
Create a small list of user-agent patterns to monitor:
ClaudeBot
ChatGPT-User
OAI-SearchBot
GPTBot
PerplexityBot
Google-Extended
anthropic-ai
CCBot
Bytespider
Applebot-Extended
Googlebot
bingbot
facebookexternalhit
meta-externalagent
Expect this list to change.
The AI crawler ecosystem is moving quickly.
Step 3: Store a daily summary
A simple Google Sheet is enough at the start.
Suggested tabs:
- Daily Summary
- AI Bot Requests
- Crawler Summary by Bot
- GA4 AI Referrals
- GSC Search Baseline
- Bot Pattern Config
This lets you see trends without needing a full data warehouse.
Step 4: Compare against site changes
This is where the value is.
Do not just ask:
Did we get more traffic?
Ask:
After we changed the site, did crawlers and AI systems start reading the pages we care about?
For example:
- add
robots.txt - add
llms.txt - clean up sitemap/canonical issues
- improve article summaries
- add structured data
- rewrite service pages to answer direct buyer questions
Then watch:
- did AI bots request the new files?
- did they reach the priority pages?
- did Search Console impressions change?
- did AI referral sessions change?
- did enquiries change?
That is a useful feedback loop.
What good looks like
At this stage, good does not mean lots of bot traffic.
Lots of bot traffic can be waste.
Good looks like:
- important pages requested successfully
- fewer crawler hits to broken or legacy URLs
- sitemap and canonical URLs being used
- article pages being fetched by relevant AI/search agents
- GA4 showing occasional AI referral sessions
- Search Console impressions rising for relevant queries
- enquiries or useful conversations increasing over time
The early aim is not certainty. It is direction.
If we change the site and the right pages become easier for search engines and AI systems to find, that should start to show up somewhere: in the logs, in Search Console, in AI referrals, or eventually in better conversations with prospects.
For a small business, that is enough to begin with.
You are not trying to reverse-engineer every AI model.
You are trying to understand whether your content is becoming more discoverable in the places people now ask questions.
The practical conclusion
This is not a finished playbook. It is an early measurement loop.
If you only look at GA4, you will miss part of the picture.
If you only look at server logs, you will confuse bot requests with human interest.
If you only look at Search Console, you will understand Google but miss the wider shift toward answer engines.
The practical answer is to combine all three.
Use:
- server logs for AI crawler and fetch evidence
- GA4 for human referral sessions from AI tools
- Google Search Console for classic search visibility
Then track the trend over time.
That gives you a way to test whether GEO work is making a real difference, rather than just adding another marketing acronym to the pile.
Sources
- Profound: Beyond JavaScript, AI Crawlers
- SEO Hero: How to use server logs to identify AI crawler behaviour
- Passion Digital: Tracking LLM bots using log file analysis
- Screaming Frog: Monitor AI bots in Log File Analyser
- Search Engine Land: Server access logs for SEO
- Botify: Tracking AI bots with log file analysis
- GetCito: How to detect AI crawlers on your website
Frequently asked questions
01What does GEO mean in website analytics?
What does GEO mean in website analytics?
GEO usually means Generative Engine Optimisation: improving and measuring how visible your content is inside AI answer engines and AI-assisted search tools such as ChatGPT, Perplexity, Claude and Gemini.
02Why does GA4 miss AI crawlers?
Why does GA4 miss AI crawlers?
GA4 mostly relies on browser-side JavaScript. Many AI crawlers and fetchers request the HTML directly, do not run your analytics tag, do not accept cookies and do not behave like normal browser sessions. They can touch your site without appearing in GA4.
03What should I track to measure AI visibility?
What should I track to measure AI visibility?
Track three streams separately: server logs for AI crawler and fetcher requests, GA4 for human referrals from AI tools, and Google Search Console for traditional Google search visibility. Mixing them into one number is misleading.
04Which AI bot user agents should I look for?
Which AI bot user agents should I look for?
Start with ChatGPT-User, GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, anthropic-ai, CCBot, Bytespider and Applebot-Extended. Keep the list under review because bot names and behaviours change.