Nobody Knows How Visible Your Brand Is in AI Search. That's Actually Fine.
Let me save you a very expensive meeting with your CFO: every AI visibility number on your dashboard right now is, at best, an educated guess. Not because your tools are broken. Not because your team picked the wrong vendor. Because the entire category of measurement is built on probabilistic estimates, not actual data.
And here's the thing: once you stop fighting that reality, you can actually start making smarter decisions.
The Uncomfortable Math Behind the Metrics
Here's what's happening under the hood of every AI visibility platform you're evaluating. They run a set of prompts against ChatGPT, Perplexity, Gemini, or whatever LLM is in scope. They record whether your brand got mentioned. They aggregate that into a score. Then they try to estimate how many real humans are asking similar questions.
That last part? That's where it gets creative.
Brainlabs breaks down four main approaches these platforms use to estimate prompt volume: panel-based surveys, clickstream inference, keyword-to-prompt modeling, and direct API sampling. Each has trade-offs. Panel data suffers from small sample sizes in B2B verticals. Clickstream is directionally useful but fuzzy at the topic level. Keyword-to-prompt modeling assumes people search the same way on LLMs as they do on Google, which anyone who's actually used both knows is laughably wrong.
The most honest approach, direct API sampling, makes no claim about real-world volume at all. It just tells you what happened when the platform asked a specific question. Transparent? Yes. Comprehensive? Not remotely.
None of this is a criticism of the vendors. I use several of them. It's a structural reality of measuring something that doesn't want to be measured. LLMs don't share prompt data. They don't give the same answer twice. Add personalization and chat history, and no two users see identical results.
Why Your Board Keeps Asking the Wrong Questions
Most AI visibility measurement fails not because the tools are bad, but because the questions are wrong.
Graph Digital's AI Visibility Report for 2026 found that 82% of B2B manufacturing and industrial brands are invisible during early-stage AI buyer discovery. That's the moment when a buyer describes a problem without naming a vendor. Most organizations aren't measuring this gap because they don't have a framework for what to measure.
The report argues that boards really only ask three questions: Do we have a problem? How big is it? Are we making progress? Everything else is noise. And most dashboards are full of noise: visibility scores that fluctuate without explanation, share-of-voice numbers that don't translate to pipeline, heat maps that look authoritative until someone asks a direct question.
I've sat in those rooms. The CMO presents a slide with a 47% AI visibility score. The CEO asks, "Is that good?" Silence. "What was it last quarter?" Different silence. "How does this connect to revenue?" The kind of silence that makes you wish you'd called in sick.
The Metrics That Actually Survive Scrutiny
So what should you be tracking? Peec AI's framework offers a useful starting point: visibility percentage (does your brand appear at all?), position (where in the response?), and sentiment (what's being said?). But even these need context.

Tracking individual prompts will always be unreliable because LLMs are non-deterministic by nature. Ask the same question twice, get two different answers. But when you group prompts into categories, by topic, by funnel stage, by customer segment, patterns emerge. You stop chasing precision and start identifying trends.
Franco's analysis suggests focusing on average share of voice and citation sources rather than simple brand mentions. A mention without a citation is a rumor. A citation is a breadcrumb that leads somewhere. The distinction matters when you're trying to figure out which content investments are actually paying off.
HubSpot's approach rolls multiple signals into a composite score: platform coverage, mention frequency, citation rate, sentiment, consistency, and share of voice. The argument for a single number is that it gives marketing leaders something to report without drowning the room in methodology debates. The argument against is that it obscures what's actually driving the number up or down.
My take? You need both. A headline metric for the board, and a decomposed view for the team that's actually doing the work.
The Attribution Problem Nobody Wants to Talk About
Here's the elephant that keeps wandering into my strategy sessions: when someone discovers your brand through ChatGPT, there's often no click to track. They Google you next, or type your URL directly. Google Analytics credits "organic" or "direct" traffic. The AI touchpoint disappears.
This isn't a minor gap. It's a fundamental break in the attribution chain that most marketing stacks were built around. We spent fifteen years optimizing for last-click attribution, and now the most influential touchpoint in the buyer journey might be completely invisible to our measurement systems.
The old framework doesn't transfer 1:1. And pretending it does is how you end up with a beautiful dashboard that tells you nothing useful.
What Actually Works Right Now
Accept the uncertainty. I know that's not the answer anyone wants, but it's the honest one. AI visibility measurement is where SEO was in 2008: directionally useful, methodologically messy, and improving fast. The platforms will get better. The standards will emerge. But we're not there yet.
In the meantime, focus on what you can control. Build content that answers the questions your buyers are actually asking, not the keywords your SEO tool says have volume. Earn citations from sources that LLMs trust. Monitor trends over time rather than obsessing over weekly fluctuations.
And when your CFO asks for the ROI on AI visibility, be honest: we're measuring a new channel with imperfect tools, the same way we measured social media in 2010 and content marketing in 2014. The brands that figured out measurement early in those channels won. The brands that waited for perfect data are still waiting.
The data is wrong. Use it anyway. Just use it wisely.