An Ahrefs test across 1,885 pages found essentially zero effect of JSON-LD on AI citations. A controlled study saw a positive signal in ChatGPT — and nothing in Gemini, AI Overviews, AI Mode, or Grok. The GEO industry has a measurement problem.

An Ahrefs test across 1,885 pages found essentially zero effect of JSON-LD on AI citations. A controlled study saw a positive signal in ChatGPT and nothing in Gemini, AI Overviews, AI Mode, or Grok. Meanwhile, agencies keep selling schema markup as a GEO lever.

The evidence bar here is remarkably low.

What the duck experiment proved (and didn't)

Mark Williams-Cook, Director at Candour, ran a test that should give any marketing ops team pause. He built a fake company page about ducks, embedded deliberately invalid JSON-LD schema, then queried multiple LLMs for the company's address. Every model returned the address. The schema was broken. The models didn't care.

Williams-Cook's conclusion: LLMs aren't parsing schema as structured data. They're treating it as text on the page, same as any other content. "Tests showing LLMs surface schema-like info don't prove the model relied on schema," he argues. Schema is still worth implementing, but calling it a "magical" LLM trigger overstates the evidence.

Two camps have formed around how schema might interact with LLMs. The first says schema lives in training data. Problem: most training pipelines strip HTML boilerplate, including JSON-LD, during data cleaning. Even if fragments survive, tokenization destroys the structured relationships. The second camp claims LLMs read schema at query time when fetching pages. Williams-Cook's experiment undercuts this directly. Invalid schema didn't block extraction. Valid schema didn't improve it.

The platform variance nobody talks about

Here's where the measurement story gets uncomfortable. A Seer Interactive test on a travel site reported a 12% increase in ChatGPT bot hits on pages with schema versus a decline on control pages. That sounds promising until you look at the scope. Bot hits aren't citations. Crawling isn't ranking. And the effect didn't replicate across other AI surfaces.

A separate controlled study found a positive ChatGPT-only effect for local business schema. Gemini? Nothing. AI Overviews? Nothing. AI Mode? Nothing. Grok? Nothing. One platform out of five showed a signal, and that signal measured bot interaction, not user-facing citations or conversions.

For marketing ops teams building attribution models, this is the kind of data that gets misread fast. A single positive signal in one environment becomes "schema improves AI visibility" in a vendor deck. The 1,885-page Ahrefs test showing no effect across ChatGPT, AI Mode, and AI Overviews gets buried.

Schema as hygiene, not as strategy

None of this means schema is worthless. It isn't. Schema.org markup remains useful for traditional SEO, entity disambiguation, and Knowledge Graph eligibility. If your brand shares a name with a fruit, a city, or another company, clean structured data helps Google's existing systems resolve the ambiguity. The cost to implement is low. The maintenance burden is minimal.

But treating schema as a primary GEO investment? That's a different claim, and it requires different evidence. Evidence the industry doesn't have yet.

Williams-Cook puts it bluntly: schema's GEO benefits are being overstated. Some GEO vendors and agencies claim schema helps LLMs interpret entities and improve citation potential, especially paired with strong content. The problem is that "paired with strong content" does a lot of heavy lifting in that sentence. Strong content alone might explain the entire effect.

What to actually test

If your team wants to figure out whether schema moves the needle for AI visibility, the experiment design matters more than the schema itself. Three signals need separate tracking:

These signals can move in opposite directions. More bot hits with zero citation lift is a crawl budget story, not a GEO win. Run the test across platforms. ChatGPT, Gemini, and AI Overviews behave differently, and a result that only appears in one environment shouldn't drive cross-platform strategy.

The hypothesis (make it falsifiable): if we add comprehensive schema to 50 product pages and hold 50 matched pages as controls, then AI citation frequency will increase by at least 10% within 60 days, because LLMs use structured data to identify authoritative entity information.

Success = statistically significant citation lift across at least two AI platforms. Guardrails = no degradation in organic CTR on test pages. Stop-loss = if crawl costs increase with no citation movement after 45 days, pause and reassess.

The trade-off

Schema implementation costs almost nothing. The real cost is opportunity cost: teams spending weeks on structured data optimization when that time could go toward content depth, entity coverage, or building the kind of authoritative pages that LLMs actually tend to cite. Don't confuse cheap to implement with free to prioritize.

The GEO space will mature. Schema's role in LLM citation may become clearer as models evolve and retrieval-augmented generation pipelines get more sophisticated. But right now, the evidence is thin, platform-dependent, and frequently misrepresented. Treating schema as a hygiene layer makes sense. Treating it as a GEO strategy requires proof that doesn't exist yet.

The duck page with broken schema got its answer returned just fine. That fact alone should recalibrate expectations.