Anthropic Repriced My Agent. Four Mitigations Before June 15.

Anthropic splits Claude Agent SDK billing on June 15, 2026. The $200 monthly credit will not cover a serious 24/7 agent. Here are the four mitigations I am testing right now.

May 22, 2026

I took a week off from writing. The gap in the feed is mine.

It was not a wasted week. I did a live on Substack about the job finder agent with Wyndo and Dheeraj Sharma → Agent I built for my friends and my family, the one that is helping them pursue new opportunities. The replay should be up on Sunday and I am genuinely happy with how it went. It was the calm half of the week.

The other half was doing mode. The news I am sharing in this post is the reason. And I am also sharing four solutions you can try before June 15, because that is the part I wish someone had handed me yesterday morning.

Here is the short version of the rollercoaster I just got off.

In April, my Claude Code usage on the Max plan got noticeably worse. Same prompts, same agent, much less runway before I hit the limit. I am not the only one who felt it, the forums are loud about it, and Anthropic itself acknowledged a capacity squeeze. So I did the thing I have been writing about for a while. I added Codex Pro to the stack and let it carry a chunk of the load. Different vendor, different harness, same agent jobs underneath. Codex earned its keep fast, and after two months running both side by side, I had real evidence that GPT-5.4 inside the Codex harness held up as a peer for most of the work my Wiz throws at a model.

Then Anthropic got a lot of compute back. They closed a deal with SpaceX for the Colossus 1 data center, more than 300 megawatts and over 220,000 NVIDIA GPUs coming online within the month. Stacked on top of a 5 GW Amazon agreement, a multigigawatt agreement with Google and Broadcom, $30 billion of Azure capacity with Microsoft and NVIDIA, plus a $50 billion American AI infrastructure plan through Fluidstack. That is real compute. On May 6 they doubled the Claude Code five-hour limit for Pro, Max, Team, and seat-based Enterprise, and they permanently removed the weekday peak-hour throttle on Pro and Max. On May 13 they added a 50 percent weekly limit bump on top, running through July 13.

I was about to drop Codex. Sincerely. I was writing notes about consolidating back to a single subscription. With the new ceiling, my workload fit comfortably under Max.

Then last week, Anthropic published the Agent SDK plan change. And that is where it stops being nice.

What Actually Changes On June 15

Starting June 15, 2026, every programmatic use of Claude gets decoupled from your subscription plan limits. Concretely:

Claude Agent SDK (the Python and TypeScript SDKs)
claude -p, the non-interactive print mode of Claude Code
Claude Code GitHub Actions
Third-party apps that authenticate via the Agent SDK with your Claude subscription

None of those count against your plan after June 15. They draw from a separate monthly Agent SDK credit. Pro gets $20 per month. Max 5x gets $100. Max 20x (the plan I am on) gets $200. Team Premium $100 per seat. Enterprise Premium $200 per seat. The credit refreshes monthly. It does not roll over. It is per user, not poolable across a team. And it requires a one-time opt-in to activate, after an Anthropic email scheduled for around June 8.

What stays on the plan: interactive Claude Code in the terminal or IDE, web chat, the mobile app, and Cowork. So if you sit at the keyboard and prompt Claude, that lives on Max. If your agent prompts Claude, that lives on the SDK credit.

When the credit runs out, two doors. If you have enabled extra usage, overflow flows to standard API rates on top of the subscription. If you have not, the request halts until the next monthly reset. You can read the official terms on the Claude help center page.

On paper, $200 a month sounds like free money. The framing is generous. The math is not.

Why $200 Is A Drop In The Sea For A 24/7 Agent

Let me show what $200 actually buys when you run an agent that wakes overnight, runs through the day, and spawns subagents for parallel work.

At current rates, Claude Sonnet 4.6 is $3 per million input tokens and $15 per million output tokens. Claude Opus 4.7 is $5 input and $25 output per million. A modest agent call with around 50,000 input tokens and 5,000 output tokens lands at about $0.225 on Sonnet and about $0.375 on Opus, before prompt caching helps and before any reasoning tokens or tool calls inflate the count.

That is one call. A real agent shift is not one call. My overnight shift runs a planning pass, then it spawns several worker tasks in parallel, each of which calls Claude several times to use tools, read files, write changes, and produce a verifiable result. A single “go do this overnight job” can fan out into twenty or thirty model calls. Some of those are cheap. Some, when the agent decides it needs Opus to think harder about a refactor or an audit, are not.

Round it generously and a serious overnight run lands somewhere between $2 and $8 in API costs. Daytime wakes add more on top. The math at one $5 shift per day puts you at $150 a month, before any actual ambition. Push the ceiling and the $200 credit is gone in two weeks, easy. After that, every call lands at standard API rates on top of the subscription I am already paying for. Or the agent stops.

This is not a hypothetical. I have written about my token bills before. When I finally started measuring properly, the waste was embarrassing, and I still found that even the disciplined version of my agent burns more tokens than the credit would cover at full ambition. The whole reason I am on a flat subscription is that I do not want to be priced per call. Pay-per-token is a tax on building anything that runs while I sleep.

Who Gets Squeezed First

I want to be careful here. Anthropic is not a charity, and they have made the case (their CFO said it on the record, and several outlets picked it up) that some accounts were running thousands of dollars of API value through a $200 subscription. Splitting plan usage from SDK usage closes that gap. From a business angle, it is a clean move. I get it.

What I am less sure they fully internalized is who gets squeezed first. The casual user lives at the keyboard, gets the doubled limits, gets the 50 percent weekly bump, and is genuinely better off than a month ago. The squeeze lands one ring out, on the practitioner. The agent builder. The person running an autonomous Wiz overnight, or the small team using Claude Code GitHub Actions to ship code without a human in the loop on every step. The people whose entire workflow is programmatic Claude. The most committed builders on the platform are the ones whose costs go up the fastest.

And those are exactly the people who are most likely to have, or to quickly add, a second harness.

Four Mitigations You Can Try Before June 15

None of these are theoretical. I tested all four this week and the first one is a working prototype on my Mac Mini. Read them as a stack, not as alternatives. The serious mitigation is layered.

1. Drive Interactive Claude Instead Of `claude -p`

This is the one I am most excited about, and the one I am least ready to publish in detail.

The interactive Claude Code session stays on your subscription after June 15. Only the print mode and the SDK move to the credit. So the question is whether you can take a job that currently runs as claude -p "do X" and instead run it as a real interactive session that opens, receives a prompt, executes the work, writes the result somewhere you can read, and closes itself. If you can, that work stays on Max forever, no SDK credit touched.

I have a working prototype as of last week. The shape: a small Terminal automation that opens Claude in a real terminal window via AppleScript, pastes the prompt through the system clipboard (Cmd+V into a bracketed-paste handoff), waits for the result file to land at a known path, and closes the window. End to end in under three minutes for a real task. I tested it by having the spawned session send an iMessage and write a JSON file, and both arrived clean.

I am not publishing the full script yet because the rough version has edge cases that will bite anyone who copies it. The production version needs to handle session cleanup, concurrency, the headless display setup, and a handful of failure modes I have only seen once. I will write that one up properly when it is solid.

Honest caveat. Anthropic will probably patch the easy version of this. Driving an interactive TUI through clipboard paste is a clever workaround, not a sanctioned integration, and the moment it gets popular enough to register as a leak in the billing model, there will be a fix shipped against it. I am building this as a six-to-twelve-month bridge, not as permanent architecture. The orchestrator pattern (one job, one window, one result file) will keep working. The specific input-injection trick might not. Plan for that.

2. Move Work To A Second Harness

The cleanest mitigation, and the one I have already paid for, is to run a second harness on a different vendor and let it carry whatever fits.

I added Codex Pro under duress in April. The story I told myself was “this is temporary, I will consolidate once Claude stabilizes.” That story was wrong, and the wrong part was the framing. The Codex experience itself was fine (it is good, the GPT-5 series is good inside that harness, the experience is different but real). Diversification was never going to be temporary. I just did not see it yet.

The same thing happened to me on the inference layer. I opened an OpenRouter account because Claude had a bad morning and my agent had nowhere to send the request. Two days later I described that account as half insurance, half extension. Insurance when the primary fails. Extension for capabilities I deliberately keep off the primary stack. The $20 of credits in OpenRouter is leverage I did not have before, for less than a coffee a week.

Codex Pro is the same shape on the harness layer. It is insurance on the days Anthropic has a capacity problem, an outage, a billing change, or a release that changes my workload economics under me. It is extension on the days everything is fine, because the OpenAI models are genuinely good at certain things, and running both lets me pick the right tool per task. The subscription buys deliberate architecture. I have written before about what the two harnesses actually feel like after months of real use. This is the mitigation that does not depend on Anthropic leaving any door open. They cannot patch your second subscription.

3. Route Narrow Calls Through OpenRouter Or A Small Local Model

Not every model call needs the smartest model on the planet. A surprising amount of my agent traffic is narrow stuff: classify this message, summarize this file, extract these fields, decide between two tool options. That work runs fine on Haiku, on GPT-4o-mini, on a cheap Gemini Flash call, or honestly on a small local model on my Mac.

So the third mitigation is to actually look at what your agent is sending to Claude and ask which calls deserve Opus, which can drop to Sonnet, and which can leave the Claude billing surface entirely. OpenRouter is the easiest way to route the “leaves Claude” set, because you keep one API and pick the model per call. A small local model on a Mac Mini is the easiest way to route the “leaves the cloud” set. Both shave real money off the SDK credit pressure without changing the shape of your agent.

Quick aside. The cross-provider routing I run on my agent is packaged on my store as the AI Model Switcher. Same logic, same triggers, same identity prompt budget. If the June 15 change is making you think about diversification for the first time, this is the rung that gets you most of the way there.

4. Audit And Trim Before The Deadline

The least glamorous mitigation, and the one I would do first.

Spend an evening this week looking at what your claude -p usage actually does. Count the calls. Bucket them. How many are doing real cognitive work? How many are doing the same thing over and over (and could be cached, batched, or merged)? How many fire because of a cron job nobody has reviewed in three months? When I did this on my own stack I found a cron that was waking an agent every fifteen minutes to check a Discord state that almost never changed. That alone was several dollars of SDK credit per week, sitting on a default I had never questioned.

The waste is real. I have written about my own token waste before, and the only thing that fixed it was measuring properly. If you trim before June 15, you both shrink the credit pressure and learn where to focus the first three mitigations.

A Small Add-On If You Do Not Want To Wire This Yourself

While I was writing this, I packaged the four mitigations into a small add-on for the Claude Code agent you already run. It is called the Claude SDK Audit Kit, and it costs $9.99. Paid yearly subscribers get it for free, the rest of you get it at the price of a sandwich.

What it does. You hand the kit to your Claude Code, say “run the audit,” and it walks your repo, your cron, and your launchd plists, finds every claude -p call and Agent SDK import, estimates monthly cost on each plan, and writes a migration plan you can act on before June 15. The same four mitigations from this post are included as ready playbooks (with the AppleScript pattern, the Codex setup checklist, the OpenRouter routing recipe, and the trim labels), plus a migration spec template, a worked example from my own stack, and twelve months of updates as Anthropic ships follow-up pricing changes.

I am being honest that this is a thin add-on, not a flagship kit. The audit scripts are simple. The playbooks are short. The value is timing. If you wait until July to audit, your June bill already hurts. If you would rather build this yourself, the four sections above are enough. If you want it ready to run tonight, that is what the kit is for.

Claude SDK Audit Kit

One more honest note. Mitigation 1 (driving interactive Claude) is probably a six-to-twelve-month bridge, not a forever solution. Anthropic will patch the easy version of that trick at some point. The kit treats it that way, and so should you. The other three mitigations do not depend on Anthropic leaving any door open.

What I Take From The Rollercoaster

The doubled limits felt good. They are real. The weekly 50 percent bump is real. The compute deals behind it (SpaceX, Amazon, Google, Microsoft and NVIDIA) are gigawatts of capacity being built right now. Anthropic is scaling, and they are pricing the scale in a way that fits their business. Fair enough.

Although the lesson for builders is the one I keep relearning. Single-vendor architecture is a comfort tax I do not want to pay. The cheapest, smallest, most embarrassingly small thing I can do to protect my agent is to keep a second harness alive on a different vendor and to keep my own stack honest about what it actually needs.

I will keep Codex. The $20 in OpenRouter stays. The interactive Claude bridge gets built properly over the next month, knowing it is a six-to-twelve-month bridge and not a forever solution. And the next time I catch myself writing “I do not need this second tool anymore,” I will read this post back to myself and put the credit card away.

The Practical Takeaways

The billing split is real and starts June 15, 2026. Agent SDK, claude -p, Claude Code GitHub Actions, and third-party SDK apps move off your plan limits. Monthly credits: $20 Pro, $100 Max 5x, $200 Max 20x. Opt-in required after the Anthropic email around June 8.
The credit is generous in absolute terms and small for serious agent operators. A modest 24/7 agent will drain $200 in days, not weeks.
Overflow flows to API rates if you enable extra usage. Otherwise the request halts until reset. Pick deliberately; do not get surprised.
Mitigation 1: Drive interactive Claude. Interactive sessions stay on the subscription. The hack works today. Expect Anthropic to patch the easy version inside a year.
Mitigation 2: Second harness. Codex Pro is the cleanest insurance. They cannot patch your second subscription.
Mitigation 3: Route narrow calls elsewhere. OpenRouter, Haiku, a small local model. Stop sending classification work to Opus.
Mitigation 4: Audit and trim. Cheapest mitigation. Do it first.
Anthropic is scaling, hard. The compute is being built. The capacity is real. The pricing change reflects business reality. Plan accordingly.

The agent will keep running. The architecture under it will be a little less elegant and a lot more honest.

Claude Code Workshop

I track every billing change, model swap, and cost optimization across my live agent stack. The caching layers, model routing, and fallback architecture that keep costs reasonable are covered in the Claude Code Workshop. Updated with the June 2026 billing split patterns.

$39 at wiz.jock.pl/store. Free for paid subscribers.

Thanks for reading Digital Thoughts! This post is public so feel free to share it.

Keeping TABs on Your AI Agents

May 22

I've been running the same benchmarks across 10 models with 101 different harness configurations at tabverified.ai, and the harness variance you're measuring across Claude and Codex shows up in every test. Same model, same benchmark, different harness, 36-point score difference. It's not noise.

Your mitigation #3 (route narrow calls elsewhere) is the one I'd push hardest. I ran DeepSeek V4 Flash against error recovery and it scored 85 on clear failure communication. DeepSeek V4 Pro scored 11. Same provider, same architecture family. The cheap model was better at the specific task. Most people are sending Opus-level traffic to tasks that a $0.10 model handles better, not just cheaper, actually better.

The June 15 change is going to force the audit you describe in mitigation #4 whether people want to do it or not. The teams that already know which model fits which task will absorb it. The teams that defaulted to "send everything to Sonnet" are the ones who get squeezed.

1 reply

Mark Ulett

Thanks for writing this piece. We all get thrown into the token economics of the pool eventually, but this repricing has a huge impact for some of us and I am one with total exposure.

I built a Cladie Code Headless Agents harness because of Configuration Impersonarion and other failure states common to the 3rd party apps. With 2factor context validation to make me sure the context was the true source of inference.

I knew about the price exposure since day 1 and hoped we would make it to the fall. It’s not something to be mad at.

Appreciate the write up. I would buy your auditor tool, but I know what this means. Time for all the baby birds to fly free from the PoppaAnthropic nest.

“Thanks for all the fish, Anthropic.”

2 replies by Pawel Jozefiak and others

3 more comments...

Digital Thoughts

Discussion about this post

Ready for more?