SE Ranking analyzed nearly 300,000 domains and found zero correlation between having an llms.txt file and getting cited by AI models. Google's John Mueller just explained why that result was inevitable.
On a recent episode of Search Off the Record, Mueller laid out the core problem with llms.txt as a discovery mechanism. His framing wasn't about spam or gaming. It was about a structural limitation that no amount of honest implementation can solve.
The Self-Reported Signal Trap
Mueller's argument is blunt. Every site's llms.txt says the same thing: pick me.
"It's basically you're telling these systems, like, I have the best website ever. And here are all of the pages that everyone must go to. And you must buy all of my products or whatever you put in there. So in LLM system, it basically, by design, can't trust what is here as a way of differentiating between different websites."
The phrase "by design" is doing heavy lifting. Mueller didn't clarify whether he meant architectural constraint (LLM retrieval pipelines structurally can't weight self-reported files) or signal degradation (self-reported claims lose value when everyone makes them). Both readings land in the same place for ops teams: llms.txt doesn't help an AI system choose your site over a competitor's. Your most accurate file and your competitor's most inflated one look the same to a model that has no external way to validate either.
This is the meta keywords problem, recycled. Every site stuffed them. Search engines stopped using them. The parallel isn't perfect, but the signal dynamics are identical.
The Data Backs Mueller Up
Two independent datasets confirm the dead end. SE Ranking's analysis of roughly 300,000 domains found no link between llms.txt adoption and citation frequency in AI-generated answers. Removing the llms.txt variable actually improved their model's accuracy. Adoption sits at about 10% across those domains, so this isn't a case of insufficient sample size.
Ahrefs looked at it from the other direction. Among approximately 38,000 domains with valid llms.txt files, 97% received zero requests for the file in May 2026. Nobody's fetching these files. Not crawlers, not agents, not retrieval systems. The infrastructure to consume llms.txt at scale doesn't appear to exist in production.
That 97% number should kill any urgency around implementation as a priority initiative. If the file isn't being read, optimizing its contents is rework with no downstream signal.
Where llms.txt Might Actually Matter
Mueller didn't reject the file entirely. He carved out one use case: on-site navigation for agents that have already arrived.
"If someone is already on your website, maybe some kind of automated system is helpful."
His example was an agent trying to buy a photograph from a specific site. The agent needs to figure out how to complete a purchase. In that scenario, llms.txt functions like a store directory for someone who already walked through the door. The distinction matters: discovery (which site to visit) versus navigation (what to do once there). Mueller sees a narrow role for the second. Zero for the first.
That's a reasonable framing for ops teams evaluating the effort. If your site has complex transaction flows or deep content architectures that agents might need to traverse, a lightweight llms.txt could reduce friction. But it won't get you discovered. And standards for agentic navigation haven't settled; Mueller mentioned WebMCP alongside other formats still under discussion, estimating six months to a year before anything stabilizes.
What to Prioritize Instead
The consistent signal from both Mueller and the data: HTML pages, internal linking, schema markup, crawlable structure, and authoritative content remain the durable levers. These are the inputs AI retrieval systems demonstrably use today across different products and pipelines. Different AI systems use different crawlers, different indexes, different retrieval heuristics. The common denominator is well-structured, parseable, authoritative web content.
For marketing ops teams weighing where to spend cycles, the framework is straightforward. Treat llms.txt as a low-effort hygiene item if you want. Five minutes, a text file, done. But don't pull resources from information architecture, internal linking audits, or schema implementation to do it. Those investments transfer across channels and retrieval methods. llms.txt, right now, transfers across nothing.
The hypothesis worth testing isn't "will llms.txt get us cited more." The data already answered that. The better experiment: instrument your server logs to track agent requests to specific content types, then optimize the pages agents actually fetch. That's where the signal is. A file nobody reads can't tell you anything.