OpenAI's Ads Manager Beta just shipped the controls B2B advertisers have been waiting for. Here's what changed, what's still missing, and how to design a clean first experiment.

Until last week, running an always-on campaign in OpenAI's Ads Manager Beta meant committing to a lifetime budget with no daily pacing lever. That constraint alone kept most demand gen teams from treating ChatGPT ads as anything more than a curiosity. The May 2025 update changes the math: daily budgets, U.S. geo-targeting down to state, DMA, and ZIP code, plus aggregate reporting totals for impressions, clicks, and spend at the campaign, ad group, and ad levels.

None of this makes ChatGPT a mature ad platform overnight. But it does make it testable.

What actually shipped

Three things landed at once, and the sequencing matters. Daily budgets are available when you create a new campaign (existing campaigns are still lifetime-only, which is a real limitation we'll come back to). Geo-targeting now supports state, DMA, and ZIP-code selection, configurable at setup or editable later in campaign settings. And aggregate reporting totals for impressions, clicks, and spend show up in table views across campaign, ad group, and ad levels, with CSV exports available for offline analysis.

Separately, OpenAI is testing dynamic ad CTAs inside ChatGPT itself: "Shop Now," "Book Now," "Learn More," auto-selected based on the creative and destination. That's still experimental, so treat it accordingly.

Search Engine Land frames this update as OpenAI making ChatGPT's ad tooling closer to what advertisers expect from Google and Meta. That's directionally right. The gap is still wide, though, especially on measurement depth.

Why demand gen teams should care (and where to stay skeptical)

Daily budgets matter because they let you pace spend against a hypothesis without locking capital into a lifetime commitment. If you're running a two-week geo experiment comparing DMA performance in, say, the Boston and Denver metros, daily budgets give you the guardrail to cap exposure at $50–$150/day and kill underperformers without burning through a pre-committed pool. That's table stakes on Google and Meta. On ChatGPT, it's new.

ZIP-code and DMA targeting unlocks regional segmentation plays that B2B SaaS teams actually use: field marketing alignment, event-market targeting, territory-based tests. If your sales org runs named-territory coverage, you can now mirror that structure in ChatGPT campaigns and compare lift against a holdout region. The ability to edit geo settings post-launch reduces the operational tax on marketing ops, too.

The reporting improvements are helpful but thin. Aggregate totals for impressions, clicks, and spend are a start. They speed up weekly reporting and reduce the manual work of summing rows in a CSV. But if your attribution model needs anything beyond top-of-funnel engagement metrics, you're still stitching together data outside the platform. Don't confuse visibility with measurement maturity.

The daily-budget limitation you'll hit immediately

Daily budgets only apply to new campaigns. That's a real constraint for teams running always-on programs. You can't migrate an existing lifetime-budget campaign to daily pacing without rebuilding it from scratch, which resets any learning the platform has accumulated. For now, treat daily budgets as an experiment-only tool. Run new tests with them; leave existing campaigns alone until OpenAI extends the feature.

This also means A/B comparisons between daily and lifetime pacing require parallel campaigns, not in-campaign toggles. Budget for the duplication.

How to run a first experiment this week

Setup: Create a new campaign in Ads Manager Beta. Select daily budget. Set $75–$150/day depending on your test market size. Choose two DMAs with comparable audience density for your ICP.

Hypothesis (make it falsifiable): If we target DMA A and DMA B with the same creative and daily budget, then CTR difference >15% between markets signals geo-specific intent variation worth scaling, because ChatGPT query context likely differs by region.

Success metric: CTR delta between DMAs. Guardrails: CPC stays within 2x your Google Search benchmark for the same audience. Stop-loss: If spend hits $1,000 with fewer than 50 clicks total, pause and reassess creative or targeting.

Readout: Pull aggregate totals from Ads Manager plus CSV export at day 7. Compare impressions, clicks, and spend by DMA. Don't over-interpret click volume as pipeline signal; this is a leading indicator test, not a conversion proof.

Next test: If one DMA outperforms, narrow to ZIP codes within that DMA and test creative variants. If neither performs, the channel might not have density for your ICP yet. That's a valid finding.

The trade-off nobody's talking about

ChatGPT ads sit inside a conversational interface. The user isn't searching with transactional intent the way they do on Google. They're asking questions, exploring ideas, working through problems. That context changes what "a click" means. A click from ChatGPT might be higher-curiosity, lower-intent than a click from paid search. Or it might be the opposite for certain verticals where the conversational frame builds trust before the click.

The honest answer: nobody has enough data yet to know. OpenAI's reporting gives you impressions, clicks, and spend. It doesn't give you conversion-path visibility. Until it does, treat every metric from this channel as directional, not definitive. Run holdout tests against your existing channels if you want to measure incrementality.

OpenAI shipped pacing, geo controls, and better reporting in a single update. Six months ago, the platform had none of those. The trajectory is clear even if the destination isn't. The teams that build clean test frameworks now, with falsifiable hypotheses and honest stop-losses, will have signal when the rest of the market is still debating whether to log in.