If your State of Marketing report needs speed but can’t tolerate bogus numbers in an exec deck, treat Claude Code like a junior analyst: fast, useful, and occasionally confidently wrong.

If your State of Marketing report needs speed but can’t tolerate bogus numbers in an exec deck, treat Claude Code like a junior analyst: fast, useful, and occasionally confidently wrong. That constraint is the whole game.

Emily Kramer put it bluntly on LinkedIn: “Claude Code filled in 10,000 cells of data for my State of Marketing Reports. But, some of it was made up.” The promise (automation) collided with the risk (phantom data). And the cleanup cost was real: “I had to go back and do the research way more methodically. It was a bit maddening, TBH.”

Here’s the move that matters: don’t try to prompt your way out of hallucinations after the fact. Change the spec so the model is allowed to say “unknown,” and force evidence before anything gets counted, charted, or repeated.

That’s not ideology. It’s operational hygiene. Coupler.io’s write-up on Claude Code for marketing makes the same core point from the practitioner side: LLMs can hallucinate during data analysis, so outputs should be verified against source files before they’re used externally. Their recommended operating principle is essentially “trust but verify”—especially before anything goes to a client.

Why this matters now: the budget story is boring, the scrutiny isn’t

State-of-Marketing reports exist for one reason: to justify decisions. And even when budgets look “fine,” the bar goes up. HubSpot’s 2023 State of Marketing report showed 47% of marketers expected budgets to increase, 45% expected budgets to stay stable, and 7% expected budgets to decrease. Stable-to-up budgets don’t remove scrutiny; they usually increase it, because leadership expects clearer allocation logic.

At the same time, the channel mix conversation is noisy. HubSpot’s 2023 trends summary called out short-form video (32%) and influencer marketing (reported in the 30–33% range) as top ROI areas, with social DMs and SEO at 29% in that same “ROI leader” cluster. Those are useful directional signals. They’re also easy to misuse when a model quietly invents a denominator, rounds a missing value, or “helpfully” fills a gap.

And the stakes aren’t abstract. One wrong benchmark can push spend into the wrong motion, distort attribution narratives, and create a month of churn between Marketing, Sales, and RevOps over what “worked.”

The one tactic: build an “unknown-first” research table with evidence gates

Kramer’s highest-leverage tip is the simplest: “Tell Claude Code to flag missing data, not fill in phantom data instead.” She noticed the failure mode the way operators usually do—pattern smell. “Way too many round numbers.” Not proof, but a signal.

But the deeper version showed up in the comments. Slava Baranskyi (FCIM) argued the fix isn’t just prompt wording; it’s structural: “hallucinations happen when the spec has no grammar for ‘I don’t know.’ Add an explicit unknown value to every output schema, plus an ‘evidence required’ column the model has to fill or it can’t return the row.” He added that after “~9 months running this stack daily,” that rule cut fabricated cells “by an order of magnitude.”

There’s another practical layer from Filip Matekovic: add a confidence column next to every field, and route anything under 0.8 to manual review before it touches HubSpot or a chart. That’s not “AI magic.” It’s triage.

Put those together and the tactic becomes a workflow: a research table where every cell has permission to be unknown, and every non-unknown value has to carry evidence.

Kramer’s other tips slot neatly into the same system. “Work in batches and double-check outliers.” Start with a few companies, confirm the method, then scale. Outliers are where scrape errors hide. Also: “Ask Claude Code to build & update a spreadsheet as your source of truth. Don’t rely on Claude’s memory or your Claude session.” The spreadsheet isn’t busywork; it’s auditability.

And the graphing warning is the one most teams learn the hard way: “Finalize your data before you build graphs.” She rebuilt charts multiple times after finding errors. The point isn’t that Figma or MCPs are bad. The point is that visualization amplifies errors—once a chart exists, people believe it.

Run it this week (Verto-style): setup, launch, readout, next test

Hypothesis (make it falsifiable): If we force Claude Code to output unknown plus an evidence link for every non-unknown value, then fabricated cells in our State of Marketing research table will drop, because the model can no longer “complete the assignment” by guessing.

Setup (90 minutes): Create a spreadsheet with columns: Company, Metric, Value, Unit, Timeframe, Source URL/File, Evidence snippet (copy/paste), Confidence (0–1), Notes. Add a hard rule: Value must be either a number or the literal string unknown.

Audience / scope: Start with 5–10 companies (or 1 report section) before scaling. Keep it small on purpose.

Owners: Marketing Ops owns the schema and QA. Demand gen owns interpretation (what it means for pipeline, qualified pipeline, and creative strategy). RevOps sanity-checks anything that will get repeated in forecasting or board materials.

Tools: Claude Code for extraction/synthesis; spreadsheet as system of record. Use your existing doc store for source files. Don’t add more tools unless they improve traceability.

Launch (same day): In Claude Code, explicitly instruct: return unknown when missing; do not infer; populate Evidence snippet and Source for every non-unknown; include Confidence. Then run one metric across a handful of companies to validate the method before expanding.

Readout (30 minutes): Spot-check: (1) all outliers, (2) all rounded numbers, (3) all low-confidence rows (<0.8 if using Matekovic’s threshold). Verify against source files. No exceptions.

Success = % of rows with valid evidence attached and verified in spot-checking. Guardrails = time-to-first-draft stays under one workday for the pilot; unknown rate is visible (not “fixed”). Stop-loss = if more than a small handful of spot-checked rows (define it internally) fail verification, pause scaling and tighten the schema/instructions before adding volume.

Trade-off (say it out loud): This will reduce volume before it improves quality. Expect more unknown early. That’s not failure; it’s the model telling the truth.

When this is wrong (and what to do instead)

This approach is overkill when the output is purely exploratory—brainstorming themes, drafting narrative structure, or summarizing what a report says in plain English. It’s also heavy if the decision doesn’t matter, or if the numbers won’t travel beyond the working doc.

But the moment a number is headed toward an exec slide, a client deliverable, or a chart that will influence budget allocation, “trust but verify” stops being a slogan and turns into process. Coupler.io’s warning exists for a reason: hallucinations during analysis are a known limitation, and the fix is verification against source files.

Kramer’s post started with 10,000 cells and ended with a method. Not because the model got smarter, but because the workflow did. The speed is real. The discipline has to be, too.