When token costs blow past forecast, most teams reach for caps. The actual problem is upstream — and it has a name.

Uber burned through its entire 2026 AI budget in four months. ServiceNow exhausted its full-year Anthropic coding allocation early. Microsoft started throttling engineers' use of Claude Code. The reflex in every case: cap spending.

That reflex is wrong. Not because the bills aren't real, but because capping spend without understanding what drove it is the demand gen equivalent of pausing all campaigns when CPL spikes instead of diagnosing which audiences decayed. The problem isn't volume. The problem is context.

What "context debt" actually means for your bill

Forrester calls it "context debt" — a recurring platform tax that accumulates when AI systems lack the structured background they need to do work efficiently. Every time a model has to reconstruct meaning from scratch (because the right data isn't machine-readable, or it's siloed across CRM, product telemetry, and billing), you pay in extra tokens. You pay again on the next call. And the next.

Anthropic's own research puts numbers on this: a single agentic workflow consumes roughly four times the tokens of a standard chat interaction. Multi-agent systems? Fifteen times. Those multipliers aren't buying you fifteen times the output. They're buying the model time to reassemble context your infrastructure should have provided.

Users see the prompt they typed. The model call includes prior interactions, tool metadata, system instructions, retrieved documents. All billable. The gap between what the user thinks they asked and what the system actually processed is where context debt lives.

Four layers of context that drive (or wreck) AI billing

A useful breakdown for ops teams maps the problem into four layers:

Context fragmentation across these layers is what produces billing errors, customer disputes, and the kind of internal fire drills that make finance teams allergic to usage-based pricing.

Why this is a GTM problem, not just a FinOps problem

Here's where it gets uncomfortable for marketing and RevOps. As AI shifts pricing from seats to usage (or outcomes, per Bain's recommendation to test outcome-based models tied to tasks completed), the customer's ability to predict their bill drops. Bill shock creates churn. Churn creates pipeline pressure. Pipeline pressure gets blamed on demand gen.

The actual fix is upstream. If your context layer is clean — CRM aligned with product telemetry, entitlements mapped to usage, thresholds triggering proactive comms before overages hit — then customers can see what drove their bill. Transparency becomes a retention lever instead of a support ticket category.

Attribution matters here, too. Enterprises need to know which tokens reflected actual user intent and which were avoidable overhead from poor context design. Without that breakdown, you can't tell whether you're spending on intelligence or on the runtime tax of rebuilding meaning every call. Limiting spend without this visibility doesn't improve unit economics; it just throttles experimentation.

Build bill vs. run bill: separate them or stay confused

One framing that helps: split AI costs into a build bill (capex for developing agentic systems, essentially R&D) and a run bill (opex for operating them, including hidden context overhead). The build bill should be judged like any experiment — hypothesis, timeline, expected signal. The run bill is where context debt compounds and where ops teams have the most leverage to reduce waste without killing output.

Current AI pricing still reflects discounted compute and venture subsidies. As Anthropic and OpenAI move toward public markets, those discounts compress. The context tax you're paying now gets more expensive, not less.

The operational discipline that doesn't exist yet

Some analysts are calling the emerging practice "ContextOps" — parallel to FinOps, but focused on the fidelity of the context feeding your AI systems rather than just the cost of compute. It's not a one-time cleanup. Business processes change, data models drift, entitlements get renegotiated. Context degrades continuously, which means maintaining it is an ongoing operational motion.

A decade ago, cloud bill shock taught companies that visibility alone doesn't create value. The same lesson applies now, with higher stakes and faster feedback loops. The organizations that treat context as infrastructure (designed, instrumented, maintained) will spend less per unit of AI output. The ones that treat it as someone else's problem will keep burning budgets in four months and wondering what went wrong.

Uber's budget didn't disappear because engineers used too much AI. It disappeared because every call carried the weight of context the system should have already known.