Duane Forrester ran the numbers on his new site, CitationIQ.com, and the results should make every marketing leader pause before their next board deck. Out of 33 reported AI assistant visits over two weeks, only 6 were real. The other 27? Spoofed requests masquerading as ChatGPT-User, PerplexityBot, and similar agents. That's an 81.8% fake rate.
The Googlebot number was worse. Of 799 Googlebot-labeled requests, just 107 came from verified Google IP ranges. The remaining 692 were impostors. Roughly 87% fake.
The scale of the noise problem
This isn't one site's quirk. Cloudflare confirmed on June 3, 2026 that bots now generate 57.5% of HTML web traffic, surpassing human traffic for the first time. HUMAN's 2026 State of AI Traffic Report found AI-driven traffic grew 187% in 2025, with automated traffic growing 8x faster than human traffic year over year. About 20% of all site visits last year were scraping attempts, nearly double the 2022 rate.
The composition matters more than the volume. Training crawlers account for 67.5% of AI-driven traffic. Only 9.3% of AI crawler requests are for search. So when your GA4 dashboard shows a spike in "AI assistant" referrals, the odds favor model training or outright scraping over an actual buyer discovering your content through an AI interface.
And it gets murkier. AI agent and agentic browser traffic grew 7,851% year over year in 2025. Google's Chrome Auto Browse (built on Gemini 3.1) launched on desktop in January 2026 and hits Android in late June. These autonomous browsers click links, fill forms, complete multi-step tasks. They look like human sessions. Simple bot filters won't catch them.
Why your funnel metrics are probably contaminated
If one in five visits is a scraping attempt and 57.5% of traffic is non-human, the downstream effects compound fast. A/B tests run on polluted traffic produce unreliable lift calculations. Landing page conversion rates get suppressed by sessions that were never going to convert. Attribution models assign credit to channels that delivered bots, not buyers.
Forrester's investigation exposed something specific about the spoofed requests: they weren't browsing content. They were targeting sensitive files like .env.production and secrets.yaml. Credential scanning dressed up as ChatGPT-User. That traffic doesn't just inflate your session count. It actively degrades the signal-to-noise ratio in every metric downstream.
Fake account creation attempts climbed 259% from 2023 to 2024, then another 89% in 2025. That pollution flows straight into your MQL counts, lifecycle stage reporting, and RevOps handoffs. When the CMO reports pipeline sourced by channel and 20% of top-of-funnel volume is bots, the whole model drifts.
What verification actually looks like
Forrester's method was straightforward: compare the bot's self-reported identity in request headers against published IP ranges from each operator. OpenAI, Anthropic, Google, Perplexity, and Common Crawl all publish their bot IP lists. A request is legitimate only when the claimed name matches a verified IP. He wrote a Python script to automate the check.
The results by crawler tell a story. ClaudeBot had 166 confirmed crawls (the most active legitimate crawler on the site). Googlebot verified at 107. GPTBot landed at 46. CCBot? Zero verified out of 20 requests. All impostors. Perplexity sat in an ambiguous zone: 24 of 36 requests failed the IP check, though some may operate from unlisted ranges.
Google's May 14, 2026 announcement of a GA4 "AI assistants" channel helps. It removes the need for custom regex workarounds to segment AI-originated traffic. But a channel label in GA4 doesn't solve identity verification. The platform tells you a visit claimed to come from an AI assistant. It can't tell you whether that claim is true.
The measurement framework that actually protects decisions
Raw traffic volume is dead as a meaningful KPI. Here's what to measure instead:
- Engagement signals per source: session duration, scroll depth, and bounce rate segmented by AI vs. direct vs. organic. If AI assistant sessions average 3 seconds with 95% bounce, that's not demand.
- Conversion correlation: does AI-sourced traffic produce any downstream pipeline? Track through to SQL, not just MQL.
- Log-level verification: run Forrester's IP check against your own access logs. Publish your spoof rate internally before anyone builds strategy on the numbers.
- Experiment hygiene: exclude unverified bot traffic from A/B test populations. If you can't filter it, at least measure the contamination rate so you can discount results appropriately.
Third-party traffic estimators add another layer of risk. SimilarWeb overestimates organic traffic by about 1%. SEMRush and Ahrefs can underestimate by 30 to 42%. Benchmarking AI traffic externally with those tools is directional at best.
Where this leaves the planning conversation
The instinct will be to treat AI traffic as a growth signal. More AI visits, more brand exposure in AI ecosystems, more opportunity. And some of that is true. Legitimate crawlers indexing your content for AI search results (the 9.3%) represent real visibility.
But 67.5% of AI crawler traffic is training collection, not search. The majority of what looks like AI assistant discovery is neither assistant nor discovery. Forrester's 81.8% spoof rate on a clean new site suggests the baseline fraud level is high before you even get to the attribution question.
The board deck that reports AI assistant traffic as a growing acquisition channel without log-level verification is, statistically, reporting on fiction. And the Googlebot number (87% fake) has been this bad for nearly two decades. The difference now is that the fiction has multiplied across a dozen new bot identities, each one easier to spoof than the last.