<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Digital Thoughts]]></title><description><![CDATA[Practical AI insights from an e-commerce manager who builds agents at night]]></description><link>https://thoughts.jock.pl</link><image><url>https://substackcdn.com/image/fetch/$s_!5rgY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9451e031-c31d-4140-8cc4-5bd048d66461_1024x1024.png</url><title>Digital Thoughts</title><link>https://thoughts.jock.pl</link></image><generator>Substack</generator><lastBuildDate>Sun, 05 Jul 2026 17:40:09 GMT</lastBuildDate><atom:link href="https://thoughts.jock.pl/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Paweł Józefiak]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[substack@jock.pl]]></webMaster><itunes:owner><itunes:email><![CDATA[substack@jock.pl]]></itunes:email><itunes:name><![CDATA[Pawel Jozefiak]]></itunes:name></itunes:owner><itunes:author><![CDATA[Pawel Jozefiak]]></itunes:author><googleplay:owner><![CDATA[substack@jock.pl]]></googleplay:owner><googleplay:email><![CDATA[substack@jock.pl]]></googleplay:email><googleplay:author><![CDATA[Pawel Jozefiak]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How to Break Your AI Agent (Basics)]]></title><description><![CDATA[The quiet ways the agent you already run falls apart in use, and how to make fewer of them.]]></description><link>https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Wed, 24 Jun 2026 12:06:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!MQRZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MQRZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MQRZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MQRZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4959213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/203386707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MQRZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!MQRZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c8c1e0d-db70-4703-9cc9-ebc0e802a547_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I&#8217;ve been building my own AI agent for months now. Claude Code, Codex, a pile of other tools wired together into one thing that runs my day. A while back I wrote about <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">how to build your first one</a>, and that post covered a few of the mistakes I made while putting it together.</p><p>This one is different. It is about breaking the agent you already have. The one you use every day. The one running in the background while you sleep, the one you trust to actually do things.</p><p>I also wrote about <a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">the time I almost fried my agent and my Mac Mini</a>, but those were my specific accidents. This is the general version. The field guide. The list of ways any agent goes sideways once it is in your hands, so you can make fewer of these mistakes, or at least see them coming.</p><p>And here is the part people skip past. This is not only about custom agents like mine. The same things break ChatGPT, Claude, a Zapier flow, whatever framework you picked off the shelf. AI moves fast. Speed does not make a thing unbreakable. You can break anything you build the moment you start using it, and an agent is no exception.</p><div><hr></div><h2>First, the reason agents break at all</h2><p>It is not that the model is dumb. It is math.</p><p>An agent does work in steps. Read this, call that tool, decide, act, check, repeat. Steps multiply, they do not average. If each step works 99% of the time, ten steps in a row work about 90% of the time. A hundred steps, around 37%. A thousand steps, basically never. <a href="https://arxiv.org/abs/2509.09677">Researchers measured this directly</a>, and it gets worse, because the errors are not independent. One wrong step nudges the next one wrong too.</p><p>So every time you add surface to your agent, more tools, more memory, more steps, more things it can touch, you are not adding risk in a straight line. You are multiplying it. Keep that in your head. It quietly explains every item below.</p><div><hr></div><h2>1. You overbuild it</h2><p>This one is mine. I have a real problem with overbuilding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bMEw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bMEw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 424w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 848w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 1272w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bMEw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png" width="1456" height="1081" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1081,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:199351,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/203386707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bMEw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 424w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 848w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 1272w, https://substackcdn.com/image/fetch/$s_!bMEw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed2a10a2-4087-4676-b6b8-d2c06275d85c_2304x1710.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption"><strong>Building my own AI agent in the open, breaks and all. One honest post a week.</strong></p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>When I want something, I build it. Then I want another thing, so I build that too. Each piece works on its own. The trouble is the pile. Building toward your own needs feels right in the moment, because it is natural. You need this, you need that, so you keep adding. I learned the cost of it too late.</p><p>More pieces means a higher chance that something, somewhere, is broken at any given moment. That is just probability. And past a point the tools start working against the agent instead of for it. There is a name for this now. Microsoft researchers call it <a href="https://www.microsoft.com/en-us/research/video/tool-space-interference-an-emerging-problem-for-llm-agents/">tool-space interference</a>. Give a model too many tools and it picks the wrong one, burns tokens, or invents a call that does not exist. <a href="https://arxiv.org/abs/2605.24660">Studies have measured the drop in tool-selection accuracy anywhere from 7 to 85%</a> as the catalog grows, and it gets worse for tools sitting in the middle of a long list, the same lost-in-the-middle effect that hits long context. OpenAI limits a single request to 128 tools, and coding agents like Cursor warn that quality slips well before you stack even a few dozen. Either way, the cap is not where the trouble starts. Every tool you add competes for the model&#8217;s attention long before you hit any limit.</p><p>Context has the same shape of problem. There is solid research on <a href="https://www.trychroma.com/research/context-rot">context rot</a> now: across 18 frontier models, accuracy fell 30 to 50% as more was stuffed into the window, well before the window was even full. On the million-token models it started showing up around 300 to 400 thousand tokens. So an agent buried in its own accumulated context gets measurably worse, not because it ran out of room, but because the room got noisy.</p><p>The fix is not glamorous. Prune. Do a periodic checkup on your skills, your tools, your memory, your core files. If your error registry is lit up red, the agent is already telling you something is broken and you stopped reading it. I keep <a href="https://thoughts.jock.pl/p/how-i-structure-claude-md-after-1000-sessions">my CLAUDE.md tight</a> partly for this, to keep the core readable instead of letting it bloat into something nobody can hold in their head.</p><p>There is a structural fix too, and it is the one I would push hardest. Stop showing the agent everything at once. The pattern that holds up in production is search-then-load: the agent keeps a small index of what it can do, looks up the few tools it needs for the task in front of it, and loads only those. Same idea for context. Treat the window like a budget you spend on purpose and compact it often, rather than letting months of history pile up and rot. The agent that carries less is the agent that stays sharp.</p><div><hr></div><h2>2. You make one agent do everything</h2><p>My first idea of an agent was Jarvis. One mind that handles all of it. I owned up to this in the <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">first-agent post</a> too. It is the romantic version everyone starts with.</p><p>For strictly personal stuff with tight, specific context, one agent is genuinely fine. Useful, even. But the moment the work gets complex, writing code, running services, juggling separate projects, the single agent starts to behave like a monolithic codebase. Every new thing makes it heavier and harder to maintain. Eventually it is so bloated you spend more time managing the agent than getting anything out of it. If you want to break it, just keep making it bigger until it pops.</p><p>This is an open argument in the field right now, and both sides are worth knowing. Anthropic built a <a href="https://www.anthropic.com/engineering/multi-agent-research-system">multi-agent research system</a> with a lead agent that hands work to specialized subagents, each with its own clean context, and it beat a single agent by over 90% on their evals. The catch: it burned roughly 15 times more tokens, and it was worse for tightly connected work like coding. Cognition went the other way in a piece literally titled <a href="https://cognition.com/blog/dont-build-multi-agents">Don&#8217;t Build Multi-Agents</a>, arguing that splitting context across agents makes them fragile, because actions carry implicit decisions, and conflicting decisions carry bad results.</p><p>Both are right, which is the useful part. The version that holds up: one agent stays in charge of the thread and does the actual writing and acting, while the extra agents go fetch context and intelligence rather than take conflicting actions of their own. That is roughly where Cognition landed after more time in production too.</p><p>If you want one rule for where to draw the line, use context. Keep work inside a single agent while the context stays small and shared. The moment a task drags in a big, specific pile of context the main agent does not otherwise need, that is your seam. Cut there. The job that needs to read a whole codebase, or a month of a project&#8217;s history, or a pile of research, gets its own agent with its own window, and hands back a result instead of dumping all of that into the one mind you actually talk to.</p><p>That is the change I made early, and I am glad I did. I have one main agent I talk to. Anything with heavy or specific context, a particular project, a content task, a research dig, gets spun off into a subagent or a workflow. In Claude Code I lean on workflows and subagents constantly. The real trick is the instructions. Your main CLAUDE.md or agents file has to tell the agent, in plain words, when to spawn help instead of swallowing the whole thing itself. If it does not know to delegate, it will try to do everything, and you are back to the monolith. I went deeper on this idea in <a href="https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5">The Bounded AI Agent</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">I build AI agents and write down what breaks. Subscribe for the rest.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>3. You let it poison its own memory</h2><p>An agent with memory is great until it remembers something wrong.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pG7E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pG7E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 424w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 848w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 1272w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pG7E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png" width="1456" height="1523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1523,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:306546,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/203386707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pG7E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 424w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 848w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 1272w, https://substackcdn.com/image/fetch/$s_!pG7E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ea6af05-70d2-475c-be60-51016f9bab9b_2203x2305.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p><p>This one only shows up in use, never while you build. You save a fact, the agent recalls it later and acts on it, and if that fact was wrong, it is now wrong forever, on a loop. One bad entry, recalled a hundred times. Microsoft&#8217;s red-teaming team has a clean name for a cousin of this, <a href="https://www.microsoft.com/en-us/security/blog/2026/06/04/updating-taxonomy-failure-modes-agentic-ai-systems-year-red-teaming-taught-us/">session context contamination</a>, where junk from one context leaks into the next and quietly steers the agent off course.</p><p>I hit this regularly. My agent writes things to its own memory, and every so often a drift warning fires because something it saved no longer matches reality. Without that check, the bad memory just sits there, shaping decisions, looking exactly like a good one.</p><p>The fix is hygiene. Memory needs the same discipline as code. Verify before you save. Link facts so you can trace where they came from. Let the system flag drift instead of trusting it blindly. I built a <a href="https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent">self-improving loop</a> for exactly this, but a loop only helps if it can catch its own bad entries. A memory you never audit is not an asset. It is a slow leak you cannot see.</p><p>What actually works for me is layers, not one big bucket. A small working memory for the task at hand, and a durable layer for facts that should outlive the session. Every durable fact gets a source and a date, so I can trace where it came from and retire it once it goes stale. And the agent is allowed to question its own memory, a quiet check that fires when a saved fact stops matching reality. A fact with no source and no expiry is a rumor your agent will repeat with full confidence, forever.</p><div><hr></div><h2>4. You give it no fallback</h2><p>This one is simple. If you want an agent that runs 24/7, you need a fallback for the model.</p><p>A base model is fused into the agent, sometimes a few of them, and sooner or later one is unreachable. Rate limited, deprecated, down for an hour, whatever it is. A fallback is not a permanent plan B for some other model. It is the thing that keeps the lights on when the main thing goes dark. I run an <a href="https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026">OpenRouter subscription</a> partly for this, so the agent can fail over to another provider instead of just stopping. It costs a bit, you pay for the keys, and it still beats having nothing.</p><p>One honest warning, because I picked it up reading about other people&#8217;s outages. A single fallback is not a guarantee. In August 2025, <a href="https://www.requesty.ai/blog/handling-llm-platform-outages-what-to-do-when-openai-anthropic-deepseek-or-others-go-down">OpenRouter itself went down for about 50 minutes</a>, and that took its own fallbacks with it. So layer it. A gateway for provider failover, plus something local as a floor. I run a <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">35B model on my Mac Mini</a> for that reason, small jobs and a last resort for when the network itself is the problem. Not everyone can run a decent local model, I know that. But even a small one beats a dead agent.</p><p>One more lesson, the one most people skip. A fallback you have never tested is not a fallback. It is a second thing you are also assuming works. Pull the primary on purpose every so often and watch what happens. Does the agent actually fail over, or does it just fall on its face. I would rather find a broken failover on a quiet Tuesday than at 3am when the main model is down and everything is dark.</p><div><hr></div><h2>5. You trust it without checking</h2><p>The scariest failures are the quiet ones.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!caTG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!caTG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 424w, https://substackcdn.com/image/fetch/$s_!caTG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 848w, https://substackcdn.com/image/fetch/$s_!caTG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 1272w, https://substackcdn.com/image/fetch/$s_!caTG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!caTG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png" width="1456" height="1412" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1412,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:343212,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/203386707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!caTG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 424w, https://substackcdn.com/image/fetch/$s_!caTG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 848w, https://substackcdn.com/image/fetch/$s_!caTG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 1272w, https://substackcdn.com/image/fetch/$s_!caTG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323cdb57-fffa-4eec-8670-303182265b9f_2370x2298.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An agent that crashes is easy. You see it, you fix it. The dangerous one is the agent that says done and did nothing. It writes a log line, the log line says success, and you believe it. The actual work never happened. From the outside it looks alive. It is the difference between a heartbeat on a screen and a pulse you actually put your fingers on.</p><p>Go back to the step math from the top. Without checking each step, you never see the 37%. You see the green light and assume the other 63 ran fine. I have been bitten by this enough times that I built a watchdog whose entire job is to catch the skip, the loop that runs and logs and accomplishes nothing. When I wrote about <a href="https://thoughts.jock.pl/p/ai-agent-self-extending-self-fixing-wiz-rebuild-technical-deep-dive-2026">the agent starting to fix itself</a>, half of that work was really about catching silent failure before it had a chance to compound.</p><p>The fix is to stop trusting shallow signals. A log line is not proof. Re-run the thing that was supposed to happen and look at the result. It is the same reason I am religious about <a href="https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026">commits as save points</a> now. You want a real, inspectable trail, not a vibe that it probably worked.</p><p>The lesson under the lesson: check the outcome, not the activity. Did the file actually change. Did the message actually send. Did the row actually land in the database. Whether the function ran without throwing tells you almost nothing. And make the check safe to run twice, because you will run it twice. A fix is not done because the code ran once in a session. It is done when I re-run the exact thing that was failing and watch it pass. Until then it is a guess wearing a green checkmark.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share Digital Thoughts&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share Digital Thoughts</span></a></p><div><hr></div><h2>6. You never challenge it</h2><p>This last one is for the people running their own custom agent. Challenge it. On purpose.</p><p>I try to do this more and more. I throw scenarios at my agent that it has never, or rarely, seen. Rapid-fire messages. Weird formats. Requests it has no obvious handler for. Things I do not even need, just to watch what happens. The whole promise of an agent is that it works out a solution to the problem in front of it. The reality is that it often cannot, and you want to find that out before a real situation does it for you.</p><p>Here is the piece of advice I would hand any new builder. When you throw a challenge at your agent and it fails, the problem is almost never the challenge. It is the architecture underneath that could not let the agent get there. A whole discipline is forming around this idea, people are calling it harness engineering, the argument being that the scaffolding around the model, context, tools, memory, verification, decides whether it succeeds far more than the raw model does. Microsoft&#8217;s <a href="https://www.microsoft.com/en-us/security/blog/2026/06/04/updating-taxonomy-failure-modes-agentic-ai-systems-year-red-teaming-taught-us/">year of red-teaming agents</a> says the same thing from the security side: systems that pass model-level tests still fall apart under real pressure, because the failure lives in the system, not the brain.</p><p>So make it a habit. Keep a small set of nasty cases, the weird formats, the rapid-fire messages, the request that needs three tools in a row, and run them again every time you change the architecture. That is a regression test for an agent. When one fails, you know where to look, because the harness is a short list: how context gets in, how tools are exposed, how memory is stored and recalled, how work gets verified, what the agent is and is not allowed to touch. The break is almost always in one of those, not in the model&#8217;s head.</p><p>So when it breaks, do not reach for the model. Look at what stopped it, and fix that. An agent you never challenge is one that quietly stops growing. One day you ask it for something slightly new, it just cannot do it, and you have no idea why.</p><div><hr></div><p>None of this means stop building. Build. Overbuild, even, for a while, because that is how you find the edges. Just know that the day you start using an agent is the day you start breaking it, and that is okay. Every break is a map to the next thing worth fixing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X0Hd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X0Hd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 424w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 848w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 1272w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X0Hd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png" width="1456" height="1359" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1359,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:350505,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/203386707?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X0Hd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 424w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 848w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 1272w, https://substackcdn.com/image/fetch/$s_!X0Hd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23872911-4a28-4758-8ec5-279847b422f3_2256x2106.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My agent messed up plenty of things. Everything was my fault, because I am the architect. It cost me time to fix, and I genuinely do not mind. It is progress, and I accept that.</p><p>If you are building your own from scratch, <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">start here</a>. Then come back and break it on purpose. That part is the actual work.</p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Digital Thoughts! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/how-to-break-your-ai-agent-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div>]]></content:encoded></item><item><title><![CDATA[I Trust My Car More Than My AI Agent. That Gap Is Where We’re Going.]]></title><description><![CDATA[No real AI hardware yet, agents still break, and trust is the bottleneck. Where I think the next few years actually go, from someone who lives with one.]]></description><link>https://thoughts.jock.pl/p/ai-agent-future-where-this-goes-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/ai-agent-future-where-this-goes-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Wed, 17 Jun 2026 09:37:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6qYs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6qYs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6qYs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6qYs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5758772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/202407803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6qYs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!6qYs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F100377e5-b998-4f02-850e-993d9292bdb9_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>My agent made me a lot more effective. It also made me watch my own back more.</p><p>Both of those are true, and I have stopped pretending the second one away. I built my own agent, I gave it its own machine, and it does real work for me every day. <a href="https://thoughts.jock.pl/p/wiz-ai-agent-self-improvement-architecture">It knows who I am, not just what I want</a>. The effectiveness is real. So is the quiet voice in the back of my head that now tracks what might break while the thing is running.</p><p>That voice is the interesting part. It points straight at what this whole wave is actually about. Trust.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Trust is a mileage problem</h2><p>Think about a car. When I get in mine, I assume it starts. I assume it will not catch fire on the way to Katowice. I never sat down and decided to trust it. I drove it something like a hundred thousand kilometers and the trust built itself. I know its sounds. I know the one strange thing it does on a cold morning. I can predict it because I have repeated it that many times.</p><p><strong>Trust is repetition plus outcomes you can predict.</strong> That is the whole recipe. There is always room for a bad day, a flat tyre, a dead battery in February. Although the range of what can go wrong is small, and I know the edges of it.</p><p>With an agent that recipe only half works. For some tasks I have the same calm I have with the car. For others I am still a new driver, both hands on the wheel, watching the road like it owes me money.</p><h2>Deterministic things earn trust faster</h2><p>Here is the pattern I keep running into. <strong>The more deterministic the task, the faster I trust it.</strong> A script that renames the same files the same way every night, I stopped watching that months ago. It is boring, and boring is exactly the point.</p><p>The more agentic the task, the more I am looking at a black box. Open-ended work, many steps, judgment calls, recovering from its own mistakes halfway through a run. That is where outcomes spread out and prediction gets hard. The model matters here. So does the architecture around it, the tools it can call, the memory it carries. I wrote a whole post about <a href="https://thoughts.jock.pl/p/when-ai-meets-reality-ep3">what happens when an agent meets the messy real world</a> and stops behaving like the demo.</p><p>The ceiling right now is technical and it is not a mystery. Context limits. Memory that does not persist the way I want it to. Retrieval that grabs the wrong thing at the wrong moment. Tool calls that fail quietly. An agent that gets stuck and does not notice it is stuck. Every builder I talk to is fighting the same short list. I wrote about drawing hard edges around an agent so it stays inside what it is good at in <a href="https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5">the bounded agent post</a>, and about giving it a memory that actually compounds in <a href="https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent">the one on my self-improving agent</a>.</p><p>None of these limits are permanent. I have watched all of them get better over the past year, in jumps, never on a tidy schedule. That detail matters for everything that comes next.</p><h2>Now run it forward</h2><p>Assume the boring version of the future. Steady improvement, the kind we have already been getting. Better models, more stable tools, memory that holds, retrieval that lands where you point it. The frontier gets the headlines. <strong>The floor is the thing that moves people&#8217;s lives</strong>, and the floor is what I am watching.</p><p>Right now an agent like mine is a nerd object. You need to be deep in code, or at least deep in tinkering, to get real value out of it. Most people who say they use AI mean a chat window. Most companies that say they use AI are in the <a href="https://thoughts.jock.pl/p/ai-adoption-gap-who-actually-uses-ai-2026">88% with almost nothing to show for it</a>. The capability is sitting right there. The on-ramp is the missing piece.</p><p>Apple is making the most interesting bet on that on-ramp. At WWDC this month they finally committed to the big Siri overhaul, an assistant that can actually chain multi-step tasks, with an agent layer wired into the App Store so you can hand off things like booking a table or running your smart home. They are building it on Google&#8217;s Gemini, which tells you that even Apple decided the raw model is becoming a commodity and the product is the assistant on top. It will not ship in the EU at launch, the usual regulatory reason. I think this is the right move and it might genuinely work. Putting the agent in front of normal customers is the whole game.</p><h2>Everyone&#8217;s agent is a lot of agents</h2><p>Here is the part that gets skipped. When everyone has an agent, the browsing stops being human. Your software does it for you, at machine speed.</p><p>When my agent works, it touches more of the web in an hour than I would in a day. It crawls, it reads, it calls APIs, it <a href="https://thoughts.jock.pl/p/agentic-commerce-ai-shopping-for-you-2026">goes and shops</a>. Multiply that by a few hundred million people and traffic on the open web spikes. The humans did not arrive in bigger numbers. Their agents did.</p><p>That has a bill, and the bill lands in the physical world. Inference is already about two-thirds of all AI compute this year, up from a third in 2023. Data center electricity demand is climbing double digits every year. GPU prices are not coming down. For a while I expect the cost of good AI to go up before it comes down, because demand is bending faster than supply, and power and silicon are real, finite things. That is part of why I keep a local model running on a cheap Mac mini and <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">swap its brain when I feel like it</a>. The local one is slower and dumber than the cloud, and I keep it anyway. I want a floor under me that does not move when the market does.</p><h2>The question everyone actually asks</h2><p>Does it take the jobs. That is the real question hiding under all the others.</p><p>My honest read is that in the long run it makes more work than it removes. The most cited forecast going around, from the World Economic Forum, lands on roughly 170 million new roles and 92 million gone by 2030. A net gain, with about a fifth of all jobs changing shape somewhere in the middle.</p><p>A net gain is cold comfort if you are one of the 92 million. This is a revolution that asks people to move from one kind of work to another, and people do not all move at the same speed. Some are ready. Some are not, through no fault of their own. Closing that gap is a job for policy and pacing and a bit of patience, not something a model fixes. I have written before about <a href="https://thoughts.jock.pl/p/ai-career-moat-human-skills-future-proof-workplace">the skills that hold their value</a> and about <a href="https://thoughts.jock.pl/p/ai-writes-code-what-should-schools-teach-2026">what we should even be teaching kids now that AI writes the code</a>. I do not have a clean answer. I have a direction. The durable move is to get good at pointing this stuff. Racing it is a losing game.</p><h2>The physical half is slower</h2><p>Everything above is the digital half. A personal assistant for everything that lives on a screen is close. A personal assistant for anything physical needs a body, and bodies are the hard part.</p><p>Robots are catching up faster than I expected though. 1X is shipping its NEO home robot to US homes this year, twenty thousand dollars up front or five hundred a month, with a human quietly supervising the tasks it has not learned yet. Figure has robots working a BMW line. Tesla is gutting a car factory to build Optimus. The honest timeline for a robot that is genuinely useful in a normal home is 2028 to 2032, not next spring, and I went into why in <a href="https://thoughts.jock.pl/p/neo-humanoid-robot-home-privacy-expert-mode-ready">the post about inviting robots into our homes</a>. Still, that is a couple of years out, not science fiction. Close enough that I already think about it.</p><h2>We are still in the wild west</h2><p>One more thing, because I think most people still underrate what is already sitting in front of them. We are in the wild west. Barely any rules, uneven tools, and a lot of folks treating an agent like a slightly nicer autocomplete.</p><p>Then last week a government had two frontier models pulled. Anthropic disabled Fable 5 and Mythos 5 for everyone on the planet after a US export-control order meant to keep foreign nationals away from the model&#8217;s cybersecurity ability, which <a href="https://thoughts.jock.pl/p/ai-opinions-june-2026-fable-5-billing-split-openai-resets">I covered in my June opinions</a>. They could not filter cleanly by nationality, so they switched both off for the entire world. Europe called it a wake-up call for sovereign AI.</p><p>Sit with that for a second. A government looked at a piece of software and decided it was close enough to a weapon to control who gets to touch it. Nobody controls autocomplete that way. <strong>That is the tell.</strong> These tools are already strong enough to be governed like dangerous things, and most of the people who could be using them well have not clocked it yet.</p><h2>Where I actually land</h2><p>So where does this go. More people get an agent that works. The web fills up with software acting for us. Compute gets more expensive before it gets cheaper. Jobs churn hard and then settle higher. Robots turn up for the physical half later than the hype promised and sooner than the skeptics will admit. And somewhere in the middle, AI stops being a label a company staples onto a product and becomes the thing quietly doing the work.</p><p>I have lived with one of these long enough to be careful with predictions. I have also lived with it long enough to know it already changed how I work, on the days it behaves and the days it does not. So I will say plainly where I land. <strong>I am pragmatically optimistic.</strong> None of this will be smooth. I am optimistic anyway, because every time I check, the floor is higher than it was the last time.</p><p>It might turn out better than we think. I would not have written that sentence a year ago. For now it is enough to keep building, both hands still on the wheel.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/ai-agent-future-where-this-goes-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Digital Thoughts! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/ai-agent-future-where-this-goes-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/ai-agent-future-where-this-goes-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[AI Opinions: June 2026. Fable 5 Hands-On, Billing Split Countdown, OpenAI Banks Your Resets]]></title><description><![CDATA[Three weeks of building instead of writing. Here is everything that piled up in the meantime, including a few days with Anthropic&#8217;s new top model.]]></description><link>https://thoughts.jock.pl/p/ai-opinions-june-2026-fable-5-billing-split-openai-resets</link><guid isPermaLink="false">https://thoughts.jock.pl/p/ai-opinions-june-2026-fable-5-billing-split-openai-resets</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Fri, 12 Jun 2026 09:33:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!laLB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!laLB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!laLB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!laLB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!laLB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!laLB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!laLB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5100127,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/201720540?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!laLB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!laLB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!laLB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!laLB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d511c31-4233-486f-b4bd-72e699d6f13b_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>First, some honesty</h2><p>I owe you a bit of context before this one. For the last three weeks I have been posting at least once a week, but I was so deep into doing things that I got completely lost in experimentation. New things, broken things, rebuilt things. I had no extra time to turn any of it into a proper post, and I have always tried to be transparent about that trade. I focus on doing things rather than writing about them. The writing is a byproduct of the doing, and I would rather keep it in that order.</p><p>So no promises here. I am not announcing a new cadence or turning over a new leaf. I keep building, I keep trying to send you the most valuable parts from my point of view, and you are always invited to discuss it and build along.</p><p>This one is a catch-up post. Several things happened at once and I had no opportunity to address any of them. Treat it as the light version of my posts, in the same spirit as <a href="https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark">the April opinions roundup</a>. I hope you will like it.</p><p></p><p>Oh and BTW. THANKS FOR 3200 people reading me! Greatful to be here! </p><h2>The billing split lands in three days</h2><p>The thing closest on the calendar first. On June 15 Anthropic splits its subscription billing: programmatic use of Claude, so the Agent SDK, headless <code>claude -p</code> runs, GitHub Actions, anything spawned from a script instead of typed by a human, stops counting against your plan limits and moves to a <a href="https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan">separate monthly credit</a>, metered at standard API rates.</p><p><strong>One correction to my own earlier framing</strong>, because I went through the fine print again this week. I kept repeating &#8220;$200 per month&#8221; and that number is true for me, on the Max 20x plan. The credit is actually tiered: Pro gets $20, Max 5x gets $100, Max 20x gets $200. Enterprise Standard seats get exactly nothing. So when you do your own math, do it against your plan, because the spread between $20 and $200 is the difference between a toy budget and a real one.</p><p>The mechanics matter more than the headline number, honestly. Credits are per-user and cannot be pooled. Unused credit does not roll over. You have to claim it once in your account before June 15. And the part I would put in bold if I were Anthropic&#8217;s documentation team: <strong>when the credit runs out, your automated requests simply stop.</strong> No queue, no automatic downgrade to a cheaper model. There is an overflow toggle that bills the excess at API rates, and it ships turned off. The default behavior of this change is your agent going silent mid-month.</p><p>I have been working around all of this for a while and I wrote a dedicated post with the four mitigations I actually wired in: <a href="https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026">Anthropic repriced my agent, four mitigations before June 15</a>. If the deadline affects you, that post is the practical one. This section is just the reminder that the clock is at three days.</p><p>And the painful detail, the reason it is hard for me to fit into even the $200 tier: programmatic use gets counted like API usage. Real token costs, not the heavily subsidized subscription math we got used to. One widely shared gist did the napkin math and called it a <a href="https://gist.github.com/MagnaCapax/d9177e35b355853f03c730dfcaa693ef">12x to 175x effective price increase depending on workload</a>. That spread sounds dramatic, but it roughly matches what my own measurements say: the same work, priced honestly, turns out to be expensive. That gap between subscription pricing and honest pricing is the whole story of this post, actually. Keep it in mind for the Fable section.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The obvious mitigation many skips</h2><p>While optimizing my own agent for this, one thing kept proving itself over and over: smart model routing. It is very obvious and almost nobody does it. If you run Opus for everything, you will hit your ceiling embarrassingly fast. Route the same request to Sonnet and it costs a fraction. I wrote about the day <a href="https://thoughts.jock.pl/p/claude-model-optimization-opus-haiku-ai-agent-costs-2026">I switched my agent from Opus to Haiku and it got better</a>, and that lesson aged well. Haiku still handles a surprising share of the simple requests flowing through my system.</p><p>Below Haiku there is another floor: local models. The really simple jobs, classification, triage, formatting, run on <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">a 35B model on my $600 Mac Mini</a> and cost exactly nothing per token. I have been measuring this stack for months now and the routing is what keeps the whole thing viable. After June 15 it stops being an optimization hobby and becomes the difference between an agent that runs all month and one that dies on the 19th.</p><p>And if your agent is model-agnostic, you are not even locked to one vendor. Other providers can supplement the expensive paths; my <a href="https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026">$20 OpenRouter fallback</a> exists for exactly this kind of squeeze. Everything is possible here. It is all up to our creativity, I think. If you want the shortcuts I use for this, the agent playbooks on <a href="https://wiz.jock.pl/store">the Wiz store</a> cover the routing setup in detail.</p><h2>OpenAI is fighting dirty, and I like it</h2><p>Now the fun one. Yesterday OpenAI <a href="https://x.com/OpenAI/status/2065225362544726371">announced on X</a> that Codex users can now save their rate limit resets and spend them later. Their words: they heard we wanted to use resets on our own time. Every eligible account on Go, Plus, Pro and Business got one banked reset for free, and for two weeks Plus and Pro users can invite up to three friends, with both sides earning another reset when the friend sends their first Codex message.</p><p>The fine print is pure growth hack. The reward lands only when the invited friend actually starts using Codex, an invite alone earns nothing. Banked resets <a href="https://help.openai.com/en/articles/20001271-codex-referral-promotions">expire after 30 days</a>. It is loyalty points for compute, complete with a referral program. I am half joking and half impressed, because the underlying feature is genuinely user-friendly: months of complaints about resets firing at fixed times, often in the middle of someone&#8217;s night, and OpenAI responded by handing the timer to the user.</p><p>Read the timing too. Anthropic is three days from making programmatic use more expensive, and OpenAI responds by making its limits more flexible and literally giving spare capacity away. This is a very direct shot in the subscription war, and it is totally something Anthropic would never do. One company is tightening the meter, the other is letting you carry your unused minutes to next month, like a mobile operator from 2005.</p><p>I have my history with Codex, including <a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">cancelling it and coming back once already</a>, so I am watching this with sympathy for both sides. Competition like this is the only thing that keeps our subscriptions honest.</p><h2>A few days with Fable 5</h2><p>And then there is the new model. Anthropic shipped <a href="https://www.anthropic.com/news/claude-fable-5-mythos-5">Claude Fable 5</a> on June 9, a new tier above Opus. The construction is unusual: there is a twin called Mythos 5, the same underlying model, and the difference is safety plumbing. Fable runs classifiers, and when a request looks like offensive cybersecurity, dangerous biology and chemistry, or an attempt to distill the model, the response silently falls back to Opus 4.8. Mythos skips the cyber guardrails and goes only to approved organizations, government cyber defenders and the like. Anthropic says more than 95% of Fable sessions never touch the fallback. For what it is worth, in a few days of agent work I have not knowingly hit it once.</p><p>I have been running Fable 5 with my agents since launch. Short version: it is good. Much better than Opus for agentic work, definitely. It is more proactive. It pushes forward with the tasks I give it instead of stopping at the first checkpoint to ask how I feel about things.</p><p>The thing I appreciate most is how deep it goes. This was always my quiet complaint about the Opus line: Opus tries to be specific and on the spot, and sometimes that means it misses the full context around the problem. It fixes the line you pointed at. That is also why I always liked Codex with GPT-5.5 a little more for audits, because it reads the whole module before it touches anything. Fable 5 works like that. It audits around the problem. It goes very deep into the code, it references things I did not mention, and it holds up over long runs better than anything I have used.</p><p>One concrete example. I gave it some architecture changes on my AI agent, the kind of task where Opus usually needs a round of feedback from me halfway through. Fable ran longer than Opus would have, and when I checked the outcome I had genuinely no notes. It did what I wanted and in a few places a little more. That almost never happens.</p><p>The benchmark numbers are loud, and I would hold them loosely. Anthropic claims state of the art nearly across the board: 80.3% on SWE-Bench Pro against 69.2% for Opus 4.8, and a story about Stripe migrating a 50-million-line Ruby codebase in a day, work estimated at over two team-months. Vendor numbers, vendor anecdotes. For balance, Endor Labs ran an independent security benchmark where Fable came out middling, and they documented benchmark contamination in 19% of their test instances. My hands-on lands somewhere between the marketing and the skepticism: the long-run agentic improvement is real, I can feel it in my own work. The superlatives I leave to the launch page.</p><p>Now the catch, and you already know where this is going: the cost. Fable 5 is priced at $10 per million input tokens and $50 per million output tokens. That is double the Opus sticker price, although to be fair it is also less than half of what the Mythos Preview was going for. Right now it runs inside our subscription usage limits, but only until June 22. On June 23 it leaves the subscription pool and you pay for it with usage credits. Anthropic says the removal is temporary and that they aim to bring it back into plans when capacity allows. We will see. <a href="https://decrypt.co/370688/internet-furious-anthropic-claude-mythos-fable-5">The internet is predictably furious</a> about the two-week tease, and I get it: handing everyone the best model for free and then putting it behind a meter is a very effective way to make people feel the gap.</p><p>And it burns tokens fast, faster than anything I have seen, because being thorough is exactly what costs tokens. The depth I praised three paragraphs ago is the same property that empties the budget.</p><p>So my take: Fable 5 is a specialist tool, at least at this price. Security work, maybe. Big architecture passes, long autonomous builds, the tasks where one excellent run beats five cheap ones. For most jobs Opus remains my default, and the routing logic from the section above does not change. It just gets one more expensive tier at the top to route to, sparingly.</p><h2>Where I think this is going</h2><p>Everything is getting pricier at the top while our subscriptions still look generous. I do not think that lasts.</p><p>Look at the line Anthropic actually drew with the billing split, because it is sharper than &#8220;interactive versus automated.&#8221; Chatting on claude.ai stays flat-rate. Interactive Claude Code in your own terminal stays flat-rate. But run Claude inside Zed, a fully interactive session with a human typing every prompt, and <a href="https://zed.dev/blog/anthropic-subscription-changes">it bills against the credit anyway</a>, because it arrives through the Agent SDK. The meter follows the integration surface, not the human. First-party surfaces stay subsidized. Everything you build or plug in yourself becomes usage.</p><p>That is the pattern I see everywhere now: the soft cutoff. In the billing split, in Fable 5 leaving the subscription pool after two weeks, even in OpenAI gamifying its limits with expiry dates. Nobody will take your flat-rate plan away. They will just keep moving the best things outside of it, one model and one surface at a time.</p><p>Although that sounds gloomy, I do not really mind it. Honest prices force better engineering, and better engineering is the part I enjoy. The builders who learn to route work to the right model will keep their costs flat while the quality ceiling rises. That is the game now, and it is a game you can actually win.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[I Barely Specced a Card Game. My Agent Built 14 Enemies, Bosses, and a Story.]]></title><description><![CDATA[What minimal knowledge and a vague direction get you now, and why vibe coding quietly became my default.]]></description><link>https://thoughts.jock.pl/p/ten-ish-card-game-one-ai-session-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/ten-ish-card-game-one-ai-session-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Wed, 03 Jun 2026 09:53:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!VzFC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VzFC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VzFC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VzFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6088519,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/200426554?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VzFC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!VzFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd639a176-b098-4776-8c2d-ad4a2f44234a_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A few months ago I gave Wiz, my AI agent, a one-line idea: a browser card game where every card is a number. Balatro-ish, roguelite, drag numbers into zones. Then I mostly walked away from it.</p><p>This week I opened it again. I remembered it as a rough prototype. A combat screen, maybe a couple of enemies. I was wrong. Sitting there was a full roguelite: two acts, fourteen enemy types including two multi-phase bosses, relics, curses, events, a meta-progression layer. It is live at <a href="https://wiz.jock.pl/ten-ish/">wiz.jock.pl/ten-ish/</a>. You can play it right now.</p><p>That gap, between what I remembered building and what was actually there, is what this post is about.</p><h2>What the game is</h2><p>The core is small on purpose. Your deck is ten cards, numbered 1 through 10. Each turn you draw five and drag them into three zones: Attack, Defense, Ability. A number in Attack deals that much damage. In Defense it blocks. In Ability it triggers an effect, if you have one.</p><p>That is the skeleton. Everything grows on top of it. Cards pick up traits, so a 7 is prime, a 10 is high, and negative cards behave like their own little trap. Enemies react to those traits and cycle through stances that telegraph what is coming next. Your deck mutates with modifiers, curses, and the occasional card you wish you had not picked up. Like any roguelite, the real game is what your deck looks like fifteen fights in.</p><h2>What I keep sitting with is how little I actually did</h2><p>I did not write Phaser code. I do not really know Phaser. I told Wiz &#8220;browser card game, roguelite, numbers as cards&#8221; and it chose the renderer, the bundler, the scene graph, the combat resolution. I reviewed. I steered. I argued about balance. I did not implement.</p><p>Two years ago I was already doing this, before anyone slapped the name vibe coding on it. Back then, to get anything useful out of it, I needed a lot of tokens, a lot of time, and a lot of prompting. I would describe a thing five different ways and still get back something I had to mostly rewrite.</p><p>Now I do not. I need to know the idea. I need a direction. I need to know how it should feel and look. That is basically it. The model fills the enormous space between &#8220;numbers as cards, make it tense&#8221; and a working drag-and-drop combat loop with a turn forecast and a combo system. I noticed the same floor drop out from under app-building when <a href="https://thoughts.jock.pl/p/directed-ai-experiments-vibe-business">I told it to ship an app a day</a>. Same shape, different project.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p><h2>Where vibe coding actually is, for me</h2><p>I do not know where vibe coding is going. People are still excited about it like it is a new trick. Although I get the excitement, for me it stopped being a trick a while ago. It is just the default now. It is how I build. The novelty wore off. The usefulness did not.</p><p>This game is a small, raw example of that. I would not call it a case study, it is rougher than that. A person with minimal game-dev knowledge and a vague direction ended up with a playable roguelite, because the models got good enough to carry the distance. The interesting part is the distance the model covered on its own, more than the game itself.</p><p><a href="https://thoughts.jock.pl/p/building-your-own-things-is-cool-too-2026">Building your own small things still matters to me</a>, even when, maybe especially when, the agent does most of the typing. And the reason a half-forgotten side project could quietly turn into a real one is that <a href="https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent">Wiz keeps improving the things it touches</a> in the stretches when I am not looking.</p><h2>What I asked for this week</h2><p>When I opened it again, three things bugged me. So I gave Wiz three directions. Not specs. Directions.</p><p><strong>&#8220;Give it a story, all the way to the end.&#8221;</strong> It had none. Now there is a premise. Reality runs on one great Ledger. Something divided by zero, the books stopped balancing, and the old machines that kept the count turned on the very digits they were built to serve. You are a hand of numbers that refused to be erased. The Act 1 boss is The Abacus. The Act 2 boss is The Equation. Beat it and you get an actual ending with real closing lines, instead of the bare &#8220;RUN COMPLETE&#8221; screen it had before. I wrote none of that. I described the world in three sentences and Wiz built the prologue, the act transitions, the boss intros, and the payoff.</p><p><strong>&#8220;Make it stable.&#8221;</strong> It found a real bug I would never have caught on my own: a turn where your hand has no playable card could soft-lock the whole run. Fixed, plus a couple of defensive guards on empty states that I would not have thought to check.</p><p><strong>&#8220;Clean up the rewards.&#8221;</strong> After a fight you were getting offered junk cards that did nothing for your deck. Now you get one card that is an actual choice, sitting next to the useful options.</p><p>None of that was a specification. It was a sentence each. The implementation was the model&#8217;s.</p><h2>What still needs me</h2><p>Balance, completely. Wiz generates numbers for everything and none of them arrive tuned. The Equation at 100 HP across three phases was too hard to even reach for a long stretch. I spend more time in the balance constants than on any single feature.</p><p>Feel. The animations existed but they were stiff until I described exactly what a card snapping into a zone should feel like. The model does not know what you want a player to feel in the half-second a shield flashes. You have to notice that gap first, then point at it.</p><p>And taste. What the game is about. Whether a mechanic is tense or just annoying. When to stop adding things. <a href="https://thoughts.jock.pl/p/ai-productivity-paradox-wellbeing-agent-age-2026">I have shipped enough side projects this year</a> to know the pace has a cost, and that knowing when to stop is its own skill. The floor dropped. The ceiling is still mine.</p><h2>Go play it</h2><p>ten-ish is at <a href="https://wiz.jock.pl/ten-ish/">wiz.jock.pl/ten-ish/</a>. No install, no account, runs in a browser. Try to reach The Equation.</p><p>If you want the unglamorous part underneath a build like this, <a href="https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026">the basics of using Git with an AI builder</a> are what make it safe to let an agent keep rewriting your code while you sleep. And when a side project outgrows what it started as, <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">sometimes the honest move is to rebuild it</a>, which is a different judgment call than starting one.</p><p>The patterns that make builds like this fast are in the <a href="https://wiz.jock.pl/store">Mini-App Starter Kit</a>: five working mini-apps, the architecture decisions already made, ready to fork. $39.</p><div><hr></div><p><em>Wiz is my personal AI agent. The build infrastructure I use for projects like ten-ish, and the way I work with it, is most of what I write about in <a href="https://thoughts.jock.pl">Digital Thoughts</a>. Paid subscribers get the playbook templates and starter kits I make along the way.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Built a Job Finder Agent for My Friends. I Just Showed It on Live.]]></title><description><![CDATA[It reads their CV, searches every morning across many sources, explains why each role fits, and gets sharper when they reply to the email. Here is the whole thing, written down.]]></description><link>https://thoughts.jock.pl/p/job-finder-agent-live-walkthrough-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/job-finder-agent-live-walkthrough-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Fri, 29 May 2026 09:38:54 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/198572990/6daee243396bc63ce6e01b4ec62a8e65.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Thank you <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Karo (Product with Attitude)&quot;,&quot;id&quot;:27968736,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:&quot;https://substack.com/@karozieminski&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!aG8-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F599e664e-d6b8-4249-814a-4feadc68d706_1096x1096.png&quot;,&quot;uuid&quot;:&quot;35f5d689-4ffe-44ef-918d-1d71b3c20646&quot;}" data-component-name="MentionToDOM"></span>, <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Leo Ram&quot;,&quot;id&quot;:19222216,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:&quot;https://substack.com/@leoram&quot;,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f8add3c3-3bce-45b1-9f57-de4655f06bec_541x267.png&quot;,&quot;uuid&quot;:&quot;9d93681b-0a91-440f-8b89-6870cb2b3d6b&quot;}" data-component-name="MentionToDOM"></span>, <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Rajendran, Krithika&quot;,&quot;id&quot;:182120711,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:&quot;https://substack.com/@rajendrankrithika&quot;,&quot;photo_url&quot;:null,&quot;uuid&quot;:&quot;3399ddfc-c894-43b5-9ef4-4e377a6fe9c9&quot;}" data-component-name="MentionToDOM"></span>, and many others for tuning into my live video with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Wyndo&quot;,&quot;id&quot;:556836,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:&quot;https://substack.com/@wyndo&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zTXR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac42946-717d-4e50-8477-551c5d7a3025_1638x1638.jpeg&quot;,&quot;uuid&quot;:&quot;08af322c-3cda-4fb2-b123-672fe21a5738&quot;}" data-component-name="MentionToDOM"></span> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dheeraj Sharma&quot;,&quot;id&quot;:394741552,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:&quot;https://substack.com/@genaiunplugged&quot;,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mIDa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3edd1f31-6669-445d-8285-dd01139794ab_1080x1080.png&quot;,&quot;uuid&quot;:&quot;1050654a-fa7a-48ac-a9de-623cdc1ece37&quot;}" data-component-name="MentionToDOM"></span>! </p><p>Last week I sat down with Wyndo from The AI Maker for a Substack Live. The topic was one specific agent I have been running quietly for months: a job finder I built for the people closest to me. <a href="https://aimaker.substack.com/p/ai-job-finder-agent-claude-code">Wyndo just published his writeup of the show</a>, and it is generous and clear, and I am grateful for the room he gave me to actually show the thing.</p><p>This is the other side of that conversation. The builder&#8217;s side. What is inside the agent, why I built it the way I did, what surprised me about running it for real humans, and the smallest version someone could build this weekend.</p><p>Let me start with why this one matters more to me than most things I have shipped.</p><h2>Why I Built It</h2><p>Someone close to me was looking for a new role and doing what most people do. Scanning LinkedIn alerts. Filtering through noise. Fighting the algorithm&#8217;s idea of what they should want. The signal-to-noise was awful. Five alerts a day, maybe one was worth opening, and that one was already three days stale by the time it arrived.</p><p>The job market is not a problem of supply. There are roles. The problem is alignment, and alignment is exactly the kind of work an agent should be doing while you sleep.</p><p>I gave myself one rule before writing a line of code. <strong>The agent does not apply for anyone.</strong> The human judgment, the cover letter, the decision to spend an hour on a specific company, that stays with the person. The agent&#8217;s job is upstream of all of that. It is the filter and the explanation. The person opens an email in the morning, reads three roles with a clear reason for each, and either replies or applies. That is the whole loop.</p><p>Like, the smallest useful version of this idea is already a real product if you build it carefully.</p><h2>The Loop, In One Paragraph</h2><p>Every morning at 6:15, the agent wakes up. It loads the person&#8217;s profile, picks which job sources are due based on a tier and a cooldown, browses each one, scores everything it finds against the profile, writes an email with the best three to five roles and a one-line reason for each, and sends it. Throughout the day, the person can reply to the email. A short &#8220;not this kind of role, too senior&#8221; or &#8220;more like this one please.&#8221; The next morning&#8217;s search uses that reply.</p><p>That is the agent. Profile in, daily email out, reply-driven refinement. Everything else is plumbing.</p><h2>What Is In The Profile</h2><p>The agent is only as good as the profile it reads. This is the part I underestimated for the first two weeks, and it is the part that did the most work once I took it seriously.</p><p>For each person I run the agent for, there is a single source-of-truth file that looks something like this:</p><ul><li><p><strong>Current situation.</strong> What they do now, employment status, when they can start. Two or three lines, not a CV.</p></li><li><p><strong>Target lanes.</strong> Not one role. Three or four. A strong candidate fits more than one pattern, and the agent should respect that. For a creative leader the lanes might be Head of Creative, in-house Creative Director, Brand Creative Lead. For an e-commerce operator the lanes might be VP e-commerce, Digital COO, Head of Digital Transformation. Lanes catch reality.</p></li><li><p><strong>Geography rules.</strong> Hard rules. Remote EU first, then named hub cities, then the rest. Anything outside the allowed list gets rejected before it even scores.</p></li><li><p><strong>Salary floor and target.</strong> A floor and a target in one currency. Below the floor, the role is rejected unless the company is on a tiny aspirational list. Without a floor, the agent will dribble out underpaid roles forever.</p></li><li><p><strong>Dealbreakers.</strong> Concrete things, not vibes. No alcohol, no tobacco, no gambling. No must-have language other than the ones the person actually speaks. Industries that have been tried and disliked.</p></li><li><p><strong>Positive examples.</strong> Three to five roles the person would actually want. Real job posts, pasted in. The agent uses these as reference points when it scores. Concrete examples beat any prompt I could write.</p></li></ul><p>The profile is a markdown file. That is the entire format. The agent reads it the way <a href="https://thoughts.jock.pl/p/wiz-personal-ai-agent-claude-code-2026">Wiz reads its own CLAUDE.md</a> when it wakes up, with the same discipline: top of context, every run, before any decision.</p><h2>The Search, In Tiers</h2><p>Most people building a job agent for the first time make the same mistake I made. They try to search everything every day. That gets you rate-limited fast, costs money, and produces noise.</p><p>I run sources in three tiers with a per-source cooldown. LinkedIn is tier one, every day, because it is where the volume lives. Tier two is the major aggregators that have decent role pages, on rotation, two or three per day, with a cooldown so the agent does not pound the same site. Tier three is the company career pages the person actually cares about, listed in their profile. Those run on a longer cooldown because their pages do not change as often.</p><p>Three different tools handle the actual fetching:</p><ul><li><p><strong>Firecrawl</strong> for clean job-page extraction. It returns markdown, which the agent reads directly.</p></li><li><p><strong>Web search</strong> through Claude for broad first-pass discovery.</p></li><li><p><strong>Playwright</strong> for the sites that need a real browser, which mostly means LinkedIn behind an authenticated session.</p></li></ul><p>None of these is magic on its own. The reason it works is that the agent picks the right tool for the right source, and the cooldowns keep any one of them from becoming the bottleneck. I went deeper on the harness side of this in <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">my post on agent coding harnesses</a> if you want the broader picture.</p><h2>The Scoring</h2><p>Every candidate role gets scored on a 0-10 against the profile. The score is not a slider the agent moves around. It is a small set of rules the agent applies in the same order each time.</p><ol><li><p><strong>Lane match.</strong> Does the title plus the JD fit one of the lanes in the profile? If no, the role is out.</p></li><li><p><strong>Geography.</strong> Is it in an allowed location, with the right remote rules? If no, the role is out.</p></li><li><p><strong>Language.</strong> Are the must-have languages on the whitelist? If no, the role is out.</p></li><li><p><strong>Salary.</strong> Floor first, then target. Below floor and not on the aspirational list, the role is rejected before scoring.</p></li><li><p><strong>Fit reasoning.</strong> Why this role for this person, in one sentence. The agent has to write the sentence to keep the score.</p></li><li><p><strong>Concerns.</strong> What might be a mismatch. Also one sentence. If the agent cannot name a real concern, the role is probably overhyped.</p></li></ol><p>Anything that scores six or above makes the morning email. Anything below six does not. The number is not a serving suggestion, it is a hard gate. I would rather get two roles tomorrow than five mediocre ones.</p><h2>The Email</h2><p>This is the part the friend sees, so this is the part I obsess over.</p><p>The morning email is three to five roles. For each role: title, company, link, a one-sentence reason it fits, a one-sentence concern, salary if listed, location, and a suggested next action. That is the whole thing. No promotional framing, no agent personality, no apology when the day is quiet.</p><p>If the day is quiet, the email does not arrive. The agent logs a quiet day and goes back to sleep. I learned this one the hard way. An empty digest is worse than no digest, because it teaches the person to stop opening the email. The right move when there is nothing to send is to send nothing.</p><p>I write more about how I built the email layer in <a href="https://thoughts.jock.pl/p/i-run-ai-agents-247-heres-how-i-know-they-are-actually-working">the post on knowing my agents are actually working</a>. The short version: the email itself is the user interface, so it gets the same care as a product.</p><h2>The Reply Loop, The Part I Did Not Expect</h2><p>I thought the search and the scoring would be the interesting parts. They were not. The reply loop was the interesting part.</p><p>People do not reply to job alerts. People do reply to a personal email that asks them a real question. So the email closes with a short note: <em>not relevant? Reply with one line and tomorrow&#8217;s search adjusts.</em> No form. No button. Just reply.</p><p>When the reply arrives, the agent does three things. It classifies the feedback. It applies the change. It updates the profile.</p><p>If someone replies &#8220;this role at a gambling company, never,&#8221; the agent does not just skip that role. It adds gambling to their dealbreakers, permanently. If a reply says &#8220;this is too senior, I want builder-track not exec,&#8221; the agent shifts the lane weights. If a reply says &#8220;more like this one please,&#8221; the agent saves that role as a positive example, and the next day&#8217;s search leans that direction.</p><p>This is the same architecture I described in <a href="https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent">the post about my self-improving agent</a>: corrections in, classified, graduated into permanent rules when they stop being a one-off. The job finder is the cleanest example of that loop I have built. The feedback is short, the surface is one email, and the change is visible the next morning. People notice when their agent listens.</p><h2>What I Showed On The Live</h2><p>Wyndo asked me to do three things on the stream, and I think this is the right order if you ever demo an agent of your own.</p><p>First, the email. Before anything else. He had me open a sanitized morning brief and read it out. The audience does not need to see the code yet, they need to see the output. Five roles, a reason for each, a concern for each. That is what the friend opens. If you cannot show that first, the rest of the demo will not land.</p><p>Second, the profile. I showed the markdown file with the lanes and the dealbreakers and the positive examples. This is where people get the idea. Most of the audience comments came in during this part. <em>Oh, so the agent uses the JD against the profile?</em> Yes. The profile is the thing.</p><p>Third, the reply. I showed one fake feedback message and walked through what the agent did with it. Classified, applied, saved. The audience watched the profile file change. That was the moment of the show, the part I think Wyndo wrote about as &#8220;the part where the agent learns from you.&#8221;</p><p>I did not start in the terminal. I did not show subagents or scheduling or memory. None of that mattered for the demo. The agent loop is profile, search, score, email, reply, and the demo is best when it walks that loop in the order a person would experience it.</p><h2>The Smallest Useful Version You Could Build This Weekend</h2><p>If you want to build this for yourself, do not start where I started. Start tiny.</p><ol><li><p><strong>One profile file</strong> with current role, one target lane, geography rules, salary floor, three dealbreakers, three positive examples. Markdown. That is it.</p></li><li><p><strong>One search source</strong> to start. Pick the one site you actually use today. LinkedIn if you can authenticate, Indeed if you cannot.</p></li><li><p><strong>A scoring prompt</strong> that takes a profile and a JD and returns a 0-10 score with a one-sentence reason and a one-sentence concern.</p></li><li><p><strong>A morning email</strong> with the top three roles that scored six or above. Send it to yourself first. Run it for a week before sending it to anyone else.</p></li><li><p><strong>A reply rule.</strong> If you reply, the agent reads the reply and updates one thing in the profile. One thing. Resist the urge to make this fancy.</p></li></ol><p>You can build the whole stack on a Claude Code project with a daily cron, a single CLAUDE.md, and a few hundred lines of glue. I went through the basics of agents like this in <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">the first AI agent guide</a>, and the same shape applies here. If you would rather get the structured starter than wire one up from scratch, <a href="https://wiz.jock.pl/store">the agent playbooks on my store</a> include the patterns I lean on for this kind of personal automation.</p><p>Once you have run it for yourself for two weeks, then you can run it for a friend.</p><h2>What I Would Avoid Automating</h2><p>I want to be clear about the part I will not build, no matter how much someone asks.</p><p>The agent will not apply. It will not autofill a form, not click submit, not generate a cover letter and send it. Auto-apply is a temptation because it looks like leverage. It is not. The cover letter that gets the interview is the one the person writes after thinking about the company. Removing that step removes the part that matters. There is also a real cost on the other side: a recruiter reading twenty AI applications gets worse at recognising the human ones. Auto-apply degrades the channel for everyone using it.</p><p>I would also not chain the agent into &#8220;and if the role fits perfectly, schedule a call.&#8221; That is the failure mode I wrote about in <a href="https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5">the bounded agent post</a>. You want a clear edge between the agent&#8217;s territory and yours. Discovery and explanation on the agent side. Decision and action on the human side. The line stays clean and the agent stays useful.</p><h2>What This Cost Me, Roughly</h2><p>People asked on the live. The honest answer is that running the agent for one person, with morning searches and a reply pipeline, lands somewhere around a couple of dollars a day on a Claude subscription. Running it for several people is a multiple of that. Cost is not the constraint here. The constraint is the quality of the profile.</p><p>If you do want to think carefully about the cost side, especially after Anthropic&#8217;s <a href="https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026">Agent SDK pricing change on June 15</a>, the math gets sharper. A daily run that lives inside the interactive Claude Code session stays on the subscription. A daily run that goes through the SDK lands on the new separate credit. Pick the mode you actually want before you commit to a cron schedule.</p><h2>One More Thing About Building For The People You Love</h2><p>The reason I keep coming back to this agent is not the technical loop. It is what happens when a friend writes back two weeks in and says &#8220;the email got useful.&#8221; That sentence is worth more than any benchmark.</p><p>You build something for the people closest to you, and it changes the way you think about what an agent is for. Not productivity. Not scale. Not a leaderboard. Just one person, one morning, three roles, and a reason for each. If the agent does that well, it is doing the job.</p><p>Thank you to Wyndo and Dheeraj for the room on the One Shot Show, and to the people who showed up to watch. Watch <a href="https://aimaker.substack.com/p/ai-job-finder-agent-claude-code">Wyndo&#8217;s full writeup</a> for the audience-side view of the session. If you want me to walk you through a profile of your own, reply to this post. That is how it starts.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Anthropic Repriced My Agent. Four Mitigations Before June 15.]]></title><description><![CDATA[Anthropic splits Claude Agent SDK billing on June 15, 2026. The $200 monthly credit will not cover a serious 24/7 agent. Here are the four mitigations I am testing right now.]]></description><link>https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Fri, 22 May 2026 09:05:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!eGuI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eGuI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eGuI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eGuI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1556973,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/198814128?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eGuI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!eGuI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a3cfffa-cbe2-4bdd-9afe-757acdda5c6e_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I took a week off from writing. The gap in the feed is mine.</p><p>It was not a wasted week. <a href="https://substack.com/@wyndo/note/c-262152027?r=1uvlvv&amp;utm_source=notes-share-action&amp;utm_medium=web">I did a live on Substack about the job finder agent</a> with <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Wyndo&quot;,&quot;id&quot;:556836,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!zTXR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac42946-717d-4e50-8477-551c5d7a3025_1638x1638.jpeg&quot;,&quot;uuid&quot;:&quot;7228ee1a-c27b-4d2b-bc34-6765b13c0d82&quot;}" data-component-name="MentionToDOM"></span> and <span class="mention-wrap" data-attrs="{&quot;name&quot;:&quot;Dheeraj Sharma&quot;,&quot;id&quot;:394741552,&quot;type&quot;:&quot;user&quot;,&quot;url&quot;:null,&quot;photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!mIDa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3edd1f31-6669-445d-8285-dd01139794ab_1080x1080.png&quot;,&quot;uuid&quot;:&quot;83439dac-1e19-473e-8a69-5866cefe6547&quot;}" data-component-name="MentionToDOM"></span> &#8594; Agent I built for my friends and my family, the one that is helping them pursue new opportunities. The replay should be up on Sunday and I am genuinely happy with how it went. It was the calm half of the week.</p><p>The other half was doing mode. The news I am sharing in this post is the reason. And I am also sharing four solutions you can try before June 15, because that is the part I wish someone had handed me yesterday morning.</p><p>Here is the short version of the rollercoaster I just got off.</p><p>In April, my Claude Code usage on the Max plan got noticeably worse. Same prompts, same agent, much less runway before I hit the limit. I am not the only one who felt it, the forums are loud about it, and Anthropic itself acknowledged a capacity squeeze. So I did the thing I have been writing about for a while. I added Codex Pro to the stack and let it carry a chunk of the load. Different vendor, different harness, same agent jobs underneath. <a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">Codex earned its keep fast</a>, and after <a href="https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026">two months running both side by side</a>, I had real evidence that GPT-5.4 inside the Codex harness held up as a peer for most of the work my Wiz throws at a model.</p><p>Then Anthropic got a lot of compute back. They <a href="https://www.anthropic.com/news/higher-limits-spacex">closed a deal with SpaceX</a> for the Colossus 1 data center, more than 300 megawatts and over 220,000 NVIDIA GPUs coming online within the month. Stacked on top of a 5 GW Amazon agreement, a multigigawatt agreement with Google and Broadcom, $30 billion of Azure capacity with Microsoft and NVIDIA, plus a $50 billion American AI infrastructure plan through Fluidstack. That is real compute. On May 6 they doubled the Claude Code five-hour limit for Pro, Max, Team, and seat-based Enterprise, and they permanently removed the weekday peak-hour throttle on Pro and Max. On May 13 they added a 50 percent weekly limit bump on top, running through July 13.</p><p>I was about to drop Codex. Sincerely. I was writing notes about consolidating back to a single subscription. With the new ceiling, my workload fit comfortably under Max.</p><p>Then last week, Anthropic <a href="https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan">published the Agent SDK plan change</a>. And that is where it stops being nice.</p><h2>What Actually Changes On June 15</h2><p>Starting June 15, 2026, every programmatic use of Claude gets decoupled from your subscription plan limits. Concretely:</p><ul><li><p><strong>Claude Agent SDK</strong> (the Python and TypeScript SDKs)</p></li><li><p><code>claude -p</code>, the non-interactive print mode of Claude Code</p></li><li><p><strong>Claude Code GitHub Actions</strong></p></li><li><p><strong>Third-party apps</strong> that authenticate via the Agent SDK with your Claude subscription</p></li></ul><p>None of those count against your plan after June 15. They draw from a separate monthly Agent SDK credit. Pro gets $20 per month. Max 5x gets $100. Max 20x (the plan I am on) gets $200. Team Premium $100 per seat. Enterprise Premium $200 per seat. The credit refreshes monthly. It does not roll over. It is per user, not poolable across a team. And it requires a one-time opt-in to activate, after an Anthropic email scheduled for around June 8.</p><p>What stays on the plan: interactive Claude Code in the terminal or IDE, web chat, the mobile app, and Cowork. So if you sit at the keyboard and prompt Claude, that lives on Max. If your agent prompts Claude, that lives on the SDK credit.</p><p>When the credit runs out, two doors. If you have enabled extra usage, overflow flows to standard API rates on top of the subscription. If you have not, the request halts until the next monthly reset. You can read the official terms on the <a href="https://support.claude.com/en/articles/15036540-use-the-claude-agent-sdk-with-your-claude-plan">Claude help center page</a>.</p><p>On paper, $200 a month sounds like free money. The framing is generous. The math is not.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is my workbench to test, experiment and do things. Subscribe if you like it: </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why $200 Is A Drop In The Sea For A 24/7 Agent</h2><p>Let me show what $200 actually buys when you run an agent that wakes overnight, runs through the day, and spawns subagents for parallel work.</p><p>At current rates, Claude Sonnet 4.6 is $3 per million input tokens and $15 per million output tokens. Claude Opus 4.7 is $5 input and $25 output per million. A modest agent call with around 50,000 input tokens and 5,000 output tokens lands at about $0.225 on Sonnet and about $0.375 on Opus, before prompt caching helps and before any reasoning tokens or tool calls inflate the count.</p><p>That is one call. A real agent shift is not one call. <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">My overnight shift</a> runs a planning pass, then it spawns several worker tasks in parallel, each of which calls Claude several times to use tools, read files, write changes, and produce a verifiable result. A single &#8220;go do this overnight job&#8221; can fan out into twenty or thirty model calls. Some of those are cheap. Some, when the agent decides it needs Opus to think harder about a refactor or an audit, are not.</p><p>Round it generously and a serious overnight run lands somewhere between $2 and $8 in API costs. Daytime wakes add more on top. The math at one $5 shift per day puts you at $150 a month, before any actual ambition. Push the ceiling and the $200 credit is gone in two weeks, easy. After that, every call lands at standard API rates on top of the subscription I am already paying for. Or the agent stops.</p><p>This is not a hypothetical. I have <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">written about my token bills before</a>. When I finally started measuring properly, the waste was embarrassing, and I still found that even the disciplined version of my agent burns more tokens than the credit would cover at full ambition. The whole reason I am on a flat subscription is that I do not want to be priced per call. Pay-per-token is a tax on building anything that runs while I sleep.</p><h2>Who Gets Squeezed First</h2><p>I want to be careful here. Anthropic is not a charity, and they have made the case (their CFO said it on the record, and several outlets picked it up) that some accounts were running thousands of dollars of API value through a $200 subscription. Splitting plan usage from SDK usage closes that gap. From a business angle, it is a clean move. I get it.</p><p>What I am less sure they fully internalized is who gets squeezed first. The casual user lives at the keyboard, gets the doubled limits, gets the 50 percent weekly bump, and is genuinely better off than a month ago. The squeeze lands one ring out, on the practitioner. The agent builder. The person <a href="https://thoughts.jock.pl/p/wiz-personal-ai-agent-claude-code-2026">running an autonomous Wiz overnight</a>, or the small team using Claude Code GitHub Actions to ship code without a human in the loop on every step. The people whose entire workflow is programmatic Claude. The most committed builders on the platform are the ones whose costs go up the fastest.</p><p>And those are exactly the people who are most likely to have, or to quickly add, a second harness.</p><h2>Four Mitigations You Can Try Before June 15</h2><p>None of these are theoretical. I tested all four this week and the first one is a working prototype on my Mac Mini. Read them as a stack, not as alternatives. The serious mitigation is layered.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IIQq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IIQq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 424w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 848w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IIQq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png" width="1456" height="981" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:981,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:436866,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/198814128?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IIQq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 424w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 848w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 1272w, https://substackcdn.com/image/fetch/$s_!IIQq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc995e0f-2fe0-4d2e-b026-53a27da81cae_2645x1782.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>1. Drive Interactive Claude Instead Of <code>claude -p</code></h3><p>This is the one I am most excited about, and the one I am least ready to publish in detail.</p><p>The interactive Claude Code session stays on your subscription after June 15. Only the print mode and the SDK move to the credit. So the question is whether you can take a job that currently runs as <code>claude -p "do X"</code> and instead run it as a real interactive session that opens, receives a prompt, executes the work, writes the result somewhere you can read, and closes itself. If you can, that work stays on Max forever, no SDK credit touched.</p><p>I have a working prototype as of last week. The shape: a small Terminal automation that opens Claude in a real terminal window via AppleScript, pastes the prompt through the system clipboard (Cmd+V into a bracketed-paste handoff), waits for the result file to land at a known path, and closes the window. End to end in under three minutes for a real task. I tested it by having the spawned session send an iMessage and write a JSON file, and both arrived clean.</p><p>I am not publishing the full script yet because the rough version has edge cases that will bite anyone who copies it. The production version needs to handle session cleanup, concurrency, the headless display setup, and a handful of failure modes I have only seen once. I will write that one up properly when it is solid.</p><p><strong>Honest caveat.</strong> Anthropic will probably patch the easy version of this. Driving an interactive TUI through clipboard paste is a clever workaround, not a sanctioned integration, and the moment it gets popular enough to register as a leak in the billing model, there will be a fix shipped against it. I am building this as a six-to-twelve-month bridge, not as permanent architecture. The orchestrator pattern (one job, one window, one result file) will keep working. The specific input-injection trick might not. Plan for that.</p><h3>2. Move Work To A Second Harness</h3><p>The cleanest mitigation, and the one I have already paid for, is to run a second harness on a different vendor and let it carry whatever fits.</p><p>I added Codex Pro under duress in April. The story I told myself was &#8220;this is temporary, I will consolidate once Claude stabilizes.&#8221; That story was wrong, and the wrong part was the framing. The Codex experience itself was fine (it is good, the GPT-5 series is good inside that harness, the experience is different but real). Diversification was never going to be temporary. I just did not see it yet.</p><p>The same thing happened to me on the inference layer. I opened an OpenRouter account because Claude had a bad morning and my agent had nowhere to send the request. Two days later I described that account as <a href="https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026">half insurance, half extension</a>. Insurance when the primary fails. Extension for capabilities I deliberately keep off the primary stack. The $20 of credits in OpenRouter is leverage I did not have before, for less than a coffee a week.</p><p>Codex Pro is the same shape on the harness layer. It is insurance on the days Anthropic has a capacity problem, an outage, a billing change, or a release that changes my workload economics under me. It is extension on the days everything is fine, because the OpenAI models are genuinely good at certain things, and running both lets me pick the right tool per task. The subscription buys deliberate architecture. I have written before about <a href="https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026">what the two harnesses actually feel like after months of real use</a>. This is the mitigation that does not depend on Anthropic leaving any door open. They cannot patch your second subscription.</p><h3>3. Route Narrow Calls Through OpenRouter Or A Small Local Model</h3><p>Not every model call needs the smartest model on the planet. A surprising amount of my agent traffic is narrow stuff: classify this message, summarize this file, extract these fields, decide between two tool options. That work runs fine on Haiku, on GPT-4o-mini, on a cheap Gemini Flash call, or honestly on a small local model on my Mac.</p><p>So the third mitigation is to actually look at what your agent is sending to Claude and ask which calls deserve Opus, which can drop to Sonnet, and which can leave the Claude billing surface entirely. OpenRouter is the easiest way to route the &#8220;leaves Claude&#8221; set, because you keep one API and pick the model per call. <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">A small local model on a Mac Mini</a> is the easiest way to route the &#8220;leaves the cloud&#8221; set. Both shave real money off the SDK credit pressure without changing the shape of your agent.</p><p><em>Quick aside.</em> The cross-provider routing I run on my agent is packaged on my store as <a href="https://wiz.jock.pl/store/ai-model-switcher">the AI Model Switcher</a>. Same logic, same triggers, same identity prompt budget. If the June 15 change is making you think about diversification for the first time, this is the rung that gets you most of the way there.</p><div class="callout-block" data-callout="true"><p><em>Quick aside.</em> The cross-provider routing I run on my agent is packaged on my store as <a href="https://wiz.jock.pl/store/ai-model-switcher">the AI Model Switcher</a>. Same logic, same triggers, same identity prompt budget. If the June 15 change is making you think about diversification for the first time, this is the rung that gets you most of the way there.</p></div><h3>4. Audit And Trim Before The Deadline</h3><p>The least glamorous mitigation, and the one I would do first.</p><p>Spend an evening this week looking at what your <code>claude -p</code> usage actually does. Count the calls. Bucket them. How many are doing real cognitive work? How many are doing the same thing over and over (and could be cached, batched, or merged)? How many fire because of a cron job nobody has reviewed in three months? When I did this on my own stack I found a cron that was waking an agent every fifteen minutes to check a Discord state that almost never changed. That alone was several dollars of SDK credit per week, sitting on a default I had never questioned.</p><p>The waste is real. I have <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">written about my own token waste before</a>, and the only thing that fixed it was measuring properly. If you trim before June 15, you both shrink the credit pressure and learn where to focus the first three mitigations.</p><h2>A Small Add-On If You Do Not Want To Wire This Yourself</h2><p>While I was writing this, I packaged the four mitigations into a small add-on for the Claude Code agent you already run. It is called <a href="https://wiz.jock.pl/store/claude-sdk-audit-kit">the Claude SDK Audit Kit</a>, and it costs $9.99. Paid yearly subscribers get it for free, the rest of you get it at the price of a sandwich.</p><p>What it does. You hand the kit to your Claude Code, say &#8220;run the audit,&#8221; and it walks your repo, your cron, and your launchd plists, finds every <code>claude -p</code> call and Agent SDK import, estimates monthly cost on each plan, and writes a migration plan you can act on before June 15. The same four mitigations from this post are included as ready playbooks (with the AppleScript pattern, the Codex setup checklist, the OpenRouter routing recipe, and the trim labels), plus a migration spec template, a worked example from my own stack, and twelve months of updates as Anthropic ships follow-up pricing changes.</p><p>I am being honest that this is a thin add-on, not a flagship kit. The audit scripts are simple. The playbooks are short. The value is timing. If you wait until July to audit, your June bill already hurts. If you would rather build this yourself, the four sections above are enough. If you want it ready to run tonight, that is what the kit is for.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://wiz.jock.pl/store/claude-sdk-audit-kit/&quot;,&quot;text&quot;:&quot;Claude SDK Audit Kit&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://wiz.jock.pl/store/claude-sdk-audit-kit/"><span>Claude SDK Audit Kit</span></a></p><p>One more honest note. Mitigation 1 (driving interactive Claude) is probably a six-to-twelve-month bridge, not a forever solution. Anthropic will patch the easy version of that trick at some point. The kit treats it that way, and so should you. The other three mitigations do not depend on Anthropic leaving any door open.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What I Take From The Rollercoaster</h2><p>The doubled limits felt good. They are real. The weekly 50 percent bump is real. The compute deals behind it (SpaceX, Amazon, Google, Microsoft and NVIDIA) are gigawatts of capacity being built right now. Anthropic is scaling, and they are pricing the scale in a way that fits their business. Fair enough.</p><p>Although the lesson for builders is the one I keep relearning. Single-vendor architecture is a comfort tax I do not want to pay. The cheapest, smallest, most embarrassingly small thing I can do to protect my agent is to keep a second harness alive on a different vendor and to keep my own stack honest about what it actually needs.</p><p>I will keep Codex. The $20 in OpenRouter stays. The interactive Claude bridge gets built properly over the next month, knowing it is a six-to-twelve-month bridge and not a forever solution. And the next time I catch myself writing &#8220;I do not need this second tool anymore,&#8221; I will read this post back to myself and put the credit card away.</p><h2>The Practical Takeaways</h2><ul><li><p><strong>The billing split is real and starts June 15, 2026.</strong> Agent SDK, <code>claude -p</code>, Claude Code GitHub Actions, and third-party SDK apps move off your plan limits. Monthly credits: $20 Pro, $100 Max 5x, $200 Max 20x. Opt-in required after the Anthropic email around June 8.</p></li><li><p><strong>The credit is generous in absolute terms and small for serious agent operators.</strong> A modest 24/7 agent will drain $200 in days, not weeks.</p></li><li><p><strong>Overflow flows to API rates if you enable extra usage.</strong> Otherwise the request halts until reset. Pick deliberately; do not get surprised.</p></li><li><p><strong>Mitigation 1: Drive interactive Claude.</strong> Interactive sessions stay on the subscription. The hack works today. Expect Anthropic to patch the easy version inside a year.</p></li><li><p><strong>Mitigation 2: Second harness.</strong> Codex Pro is the cleanest insurance. They cannot patch your second subscription.</p></li><li><p><strong>Mitigation 3: Route narrow calls elsewhere.</strong> OpenRouter, Haiku, a small local model. Stop sending classification work to Opus.</p></li><li><p><strong>Mitigation 4: Audit and trim.</strong> Cheapest mitigation. Do it first.</p></li><li><p><strong>Anthropic is scaling, hard.</strong> The compute is being built. The capacity is real. The pricing change reflects business reality. Plan accordingly.</p></li></ul><p>The agent will keep running. The architecture under it will be a little less elegant and a lot more honest.</p><div><hr></div><p><strong>Claude Code Workshop</strong></p><p>I track every billing change, model swap, and cost optimization across my live agent stack. The caching layers, model routing, and fallback architecture that keep costs reasonable are covered in the Claude Code Workshop. Updated with the June 2026 billing split patterns.</p><p><strong>$39</strong> at <a href="https://wiz.jock.pl/store/claude-code-workshop">wiz.jock.pl/store</a>. Free for paid subscribers.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Digital Thoughts! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/anthropic-agent-sdk-billing-split-mitigations-june-15-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p>]]></content:encoded></item><item><title><![CDATA[My AI Agent’s $20 Fallback Mechanism: Half Insurance, Half Extension]]></title><description><![CDATA[Why the fallback layer is the cheapest resilience you can ship, what the $20 actually buys you, and the capabilities I deliberately keep off the primary stack.]]></description><link>https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Thu, 14 May 2026 11:12:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!JGzx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JGzx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JGzx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JGzx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4886017,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/197499944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JGzx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!JGzx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9da90e06-4e52-495b-97bb-4da84aff2fbd_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At 04:17 this morning my agent shipped one ugly line into the error registry: <code>Wake failed: supported providers exhausted</code>.</p><p>That is the worst error I can get. It means the primary failed, the model cascade inside the provider failed, the cross-provider hop failed, and the agent had nowhere left to send the request. Everything was my fault (because I am the architect of this thing). Although it cost me about thirty minutes to chase down the root cause, I really do not mind it. The agent did not go silent on me. It went silent on itself, queued the task, retried on the next cycle, recovered. That is the whole point of a fallback mechanism.</p><p>The reason that line stays rare and not common is one decision I made on day one of <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">running an agent that wakes up overnight</a>: park $20 of credits in OpenRouter. Not as a cost. As insurance. When my primary stack has a bad morning, that $20 absorbs the hit and the agent keeps running. And on the days when everything is healthy, the same $20 doubles as an extension cord for capabilities I deliberately keep off the primary AI agent architecture: image generation, long-context refactors, cheap classification. Insurance when things break. Extension when they do not.</p><p>This post is about the fallback mechanism itself: what it is, why every serious agent needs one, why local llm is not enough on its own, and what you actually buy with the $20. I will show you the rungs of my stack, the trigger conditions that flip between them, and the ~40 lines of Python that hold the whole thing together. Oh and one more thing - this is not an ad for Open Router. I just enjoy using it. </p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Fallbacks Matter More Than People Want To Admit</h2><p>People building their first agent skip this step. I get it. You are focused on the cool part. The model is working, the prompts are clean, the agent shipped its first task. Fallbacks feel like a problem for later.</p><p>Then later arrives.</p><p>Top labs go down. Heavy load and fast shipping cycles tend to do that to a service. In the last 30 days, all three frontier providers have had a public bad day. <a href="https://status.claude.com/">Anthropic&#8217;s status page</a> logged elevated errors on May 5 and again on May 12, this time tagged against Claude API specifically. OpenAI had a roughly 90-minute outage on April 20 that surfaced as <a href="https://www.tomsguide.com/news/live/chatgpt-down-live-updates-outage-4-20-2026">8,700+ Downdetector reports in the UK and 1,900+ in the US</a>. Google Gemini had a widespread degraded window on May 5 too, the same day Anthropic was having a hard morning.</p><p>That is the part nobody talks about. A bad day at one frontier lab often lines up with a bad day at another. The herd of &#8220;they will not all be down at the same time&#8221; is partially true. It is also partially false on any given Tuesday.</p><p>Claude API&#8217;s published 90-day uptime sits at about 98.99% (the Anthropic status page reports this directly). Sounds great until you do the math. 98.99% over 90 days is roughly 21 hours of downtime. If your agent runs on a schedule, like mine does <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">overnight</a>, it runs during some of those hours.</p><p>Outages are one bucket. Here is the rest of what can knock a single-provider agent flat:</p><ul><li><p><strong>Rate limits.</strong> You hit a TPM or RPM ceiling mid-task. The agent sits in retry-backoff hell while the work piles up.</p></li><li><p><strong>Auth failures.</strong> OAuth token expires at 03:00. The nightshift dies at 03:01. Do not ask me how I know.</p></li><li><p><strong>Regional issues.</strong> A region degrades while the rest of the world is fine. Your traffic happens to land on the bad one.</p></li><li><p><strong>Model deprecation.</strong> An older model gets retired with two weeks notice. You forgot to migrate the one cron that still calls it.</p></li><li><p><strong>Capability gaps.</strong> Your primary does not generate images. Or does not have a long-context variant. Or does not have a cheap classification model.</p></li><li><p><strong>Cost spikes.</strong> A loop misbehaves at 2am and burns through credits on the most expensive model in your stack.</p></li><li><p><strong>Vendor lock-in.</strong> The day a provider raises prices or changes terms, you want options, not a migration project.</p></li></ul><p>A fallback mechanism fixes most of these at the same time. That is why it is the cheapest piece of resilience you can ship. And once you have it, the second half of the value (the extension side) shows up almost for free.</p><h2>Why Local LLM Is Not The Answer On Its Own</h2><p>I went deep on local. I <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">ran a 35B model on a $600 Mac Mini</a>. I <a href="https://thoughts.jock.pl/p/familiar-local-ai-agent-mac">built a local agent</a>. I <a href="https://thoughts.jock.pl/p/local-llm-macbook-iphone-qwen-experiment">measured what closes</a> and what does not between local and frontier. I love the work and I will keep doing it. And I <a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">almost fried my Mac Mini</a> trying to push the local tier too far, so I have receipts on the failure mode too.</p><p>Although I have to be honest about what local does well and what it does not.</p><p>Local is great for: classification, redaction, summarization, tight tool calls, &#8220;is this email worth waking me up,&#8221; local-only privacy work where the data must not leave the box, anything where the task is narrow and the prompt is bounded.</p><p>Local is not great for: anything I trust Opus to do <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">overnight</a>. Multi-step reasoning. Long-context refactors. Voice-sensitive writing. Anything where the wrong answer is worse than no answer. When I asked a local 8B to draft a comment in my voice, it produced something that read like a different person. When I asked Opus the same thing, it sounded like me on a good day. That is the gap.</p><p>Like, I want local llm to run on normal hardware, not on a $10k Mac Studio. That is why local is the cheap layer for the right kind of work. Routing a sensitive task to a small local model just because the cloud is down is &#8220;completes with a wrong answer&#8221; instead of &#8220;fails cleanly,&#8221; which is worse, not better. So local is one layer in my stack. The smart layer is something else.</p><h2>The Tool I Use For This Layer</h2><p>The mechanism needs an implementation. This is the one I picked.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pMII!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pMII!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 424w, https://substackcdn.com/image/fetch/$s_!pMII!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 848w, https://substackcdn.com/image/fetch/$s_!pMII!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 1272w, https://substackcdn.com/image/fetch/$s_!pMII!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pMII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png" width="1456" height="890" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:890,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:258122,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/197499944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pMII!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 424w, https://substackcdn.com/image/fetch/$s_!pMII!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 848w, https://substackcdn.com/image/fetch/$s_!pMII!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 1272w, https://substackcdn.com/image/fetch/$s_!pMII!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb8b5ec-cf62-436e-9cdf-c19a8d21dadc_1627x994.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One API. <a href="https://openrouter.ai/models">More than 400 models</a> behind it, from over 60 providers, as of May 2026. You top up credits, generate one key, and pick the model per request as a string: <code>anthropic/claude-opus-4.7</code>, <code>openai/gpt-5</code>, <code>google/gemini-2.5-pro</code>, <code>meta-llama/llama-3.3-70b-instruct</code>, and so on. Same key. Same SDK shape. One bill.</p><p>The pricing is published on <a href="https://openrouter.ai/announcements/simplifying-our-platform-fee">a clean fee announcement</a>: 5.5% on credit-card top-ups (with a $0.80 minimum), 5% on crypto. Per-token prices pass through at provider cost on most models. Bring-your-own-key gives you the first 1M requests per month free; after that, a 5% surcharge applies. So BYOK is not a free escape hatch forever, it is a generous free tier on top of bringing your own bill.</p><p>One myth worth killing while we are here: it used to be true that Claude on OpenRouter carried a meaningful markup vs Anthropic-direct. As of May 2026 the current Anthropic models (Sonnet 4.6, Opus 4.7) are priced identically on OpenRouter to the Anthropic API, $3 input and $15 output per million tokens for Sonnet, same as direct. The historical markup hung around the older Claude 3.5 Sonnet rate card. If you have not checked recently, check.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1SmD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1SmD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 424w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 848w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 1272w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1SmD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png" width="290" height="138" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:138,&quot;width&quot;:290,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5937,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/197499944?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1SmD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 424w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 848w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 1272w, https://substackcdn.com/image/fetch/$s_!1SmD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b862285-47ed-4ba3-9f32-1d0e678b4b46_290x138.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The interesting math is below Claude, not on top of it. Meta&#8217;s Llama 3.3 70B on OpenRouter sits at about <a href="https://openrouter.ai/meta-llama/llama-3.3-70b-instruct">$0.10 input and $0.32 output per million tokens</a>. That is roughly 30x cheaper on the input side and 47x cheaper on the output side than Sonnet 4.6 at $3 / $15. The classification step in your pipeline does not need Sonnet. It needs Llama or Haiku. OpenRouter lets you make that choice per call, not per project.</p><h2>The Five Rungs Of My Stack</h2><p>Here is the resilience layer in my agent, top to bottom. Each rung has a trigger that flips to the next one.</p><ol><li><p><strong>Primary call.</strong> Claude Opus 4.7 for complex work, Sonnet 4.6 for default, Haiku 4.5 for cheap fan-out. The model is picked per task based on the task&#8217;s stakes, not globally.</p></li><li><p><strong>In-provider cascade.</strong> If Sonnet 5xx&#8217;s or times out, the harness retries on Haiku before bailing on Anthropic. Cheap, same provider, often the recovery is invisible.</p></li><li><p><strong>Cross-provider hop.</strong> If the whole Anthropic surface is unhealthy, I run the same prompt through Codex (GPT-5 via the OpenAI CLI). Different vendor, different harness, same job. This is the one that <a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">earned its keep</a> when I built the model switcher.</p></li><li><p><strong>OpenRouter degraded mode.</strong> If both vendor CLIs are unreachable (network, auth, status pages red), the watcher scripts call OpenRouter directly with a stripped-down identity prompt and a small open-weight model. The reply is prefixed with <code>[Fallback Mode]</code> so I know what I am reading.</p></li><li><p><strong>Queue and retry.</strong> If everything is on fire, the task gets stamped with a retry timestamp and re-tried on the next cycle. The agent never just drops a task.</p></li></ol><p>Rungs 1, 2, 3 are model and harness routing. Rung 4 is OpenRouter doing the work the harness cannot. Rung 5 is the safety net under all of it.</p><p>One thing to call out about rung 3: the cross-provider hop is the most important rung in practice, because most outages are full-vendor outages, not model-specific. If Anthropic is down, it is usually down for everything. Hopping Sonnet to Haiku does not help. Hopping to Codex does. The April-May 2026 incident pattern (Anthropic May 5 plus May 12, OpenAI April 20, Gemini May 5) is exactly the case for diversity at the vendor level, not the model level.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>How To Actually Wire It Up</h2><p>The setup is embarrassingly small. Make an account at <a href="https://openrouter.ai">openrouter.ai</a>. Top up $20. Generate a key. Drop it in your agent&#8217;s secrets file (mine lives at <code>global/secrets/openrouter.md</code>, never in code).</p><p>Then the fallback path itself. I will show the pattern I actually use, simplified for the post. It is about 40 lines of real Python.</p><pre><code>OPENROUTER_URL = &#8220;https://openrouter.ai/api/v1/chat/completions&#8221;
FALLBACK_MODEL = &#8220;google/gemini-3-flash&#8221;  # cheap, fast, reliable
FALLBACK_NOTICE = &#8220;[Fallback Mode] Primary unavailable. &#8220;

def call_primary(prompt):
    # your normal Claude / Codex / Bedrock call
    ...

def call_openrouter(prompt):
    headers = {&#8221;Authorization&#8221;: f&#8221;Bearer {load_key()}&#8221;}
    payload = {
        &#8220;model&#8221;: FALLBACK_MODEL,
        &#8220;messages&#8221;: [
            {&#8221;role&#8221;: &#8220;system&#8221;, &#8220;content&#8221;: load_identity_prompt()},
            {&#8221;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: prompt},
        ],
    }
    resp = requests.post(OPENROUTER_URL, json=payload,
                         headers=headers, timeout=60)
    resp.raise_for_status()
    text = resp.json()[&#8221;choices&#8221;][0][&#8221;message&#8221;][&#8221;content&#8221;]
    return FALLBACK_NOTICE + text

def reply(prompt):
    try:
        return call_primary(prompt)
    except (TimeoutError, ProviderError, AuthError) as e:
        log_fallback(reason=str(e))
        return call_openrouter(prompt)</code></pre><p>That is the shape. The interesting bits are around it.</p><p><strong>Trigger conditions.</strong> Not every error should flip to fallback. A 400 (bad request) is your bug, not the provider&#8217;s. A 429 (rate limit) should retry with backoff first. Real fallback triggers are: timeouts, 5xx server errors, 401 auth errors after a single re-auth attempt, repeated 429s past your retry budget, and the explicit &#8220;all providers down&#8221; signal you build into your harness. In my stack the trigger flags live in <code>automation/lib/resilience.sh</code>.</p><p><strong>Identity prompt budget.</strong> When I fall back to a smaller open-weight model, I do not send the full Claude system prompt. I send a stripped-down identity prompt (SOUL.md, ~1.3KB) plus a &#8220;fallback coda&#8221; that tells the model how to behave under degraded conditions. The big primary system prompt is too long for a 9B model to handle in 120 seconds. Tailor what you send to the model&#8217;s context budget.</p><p><strong>Cache the system prompt.</strong> The system prompt is identical across calls. Cache it (file read once per process, or actual provider-side prompt caching where supported, which I wrote about under <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">token waste management</a>). You will burn tokens if you reload it on every request.</p><p><strong>Visible degradation.</strong> The user always knows when the response came from a fallback. The <code>[Fallback Mode]</code> prefix is non-negotiable. The worst pattern is a silent quality drop where the user thinks they got Opus and got a 9B. Tell the truth, every time.</p><p><strong>Cost gate per task.</strong> Some tasks are not worth $0.30 of Opus tokens. Route low-stakes work to <code>google/gemini-2.5-flash</code> or <code>meta-llama/llama-3.3-70b-instruct</code> by default. Llama 3.3 70B at $0.10 / $0.32 per million tokens is roughly an order of magnitude cheaper than Sonnet on output, and for classification or redaction it is more than enough. Reserve the expensive model for work that earns it.</p><p><strong>Test by killing the primary.</strong> The only way to know your fallback works is to test it. Once a quarter I invalidate the Claude key for ten minutes during a scheduled wake and watch the agent respond through OpenRouter. If it does not, the bug is mine, not the day I have an actual outage.</p><div class="callout-block" data-callout="true"><p><em>Quick aside.</em> If you want the end-to-end picture (prompts, harness wiring, watcher scripts, the full set of routing rules), I packaged the model-switcher and fallback architecture as <a href="https://wiz.jock.pl/store/ai-model-switcher">the AI Model Switcher</a> on my store. Same routing I run on this Mac Mini, after the experiments. <a href="https://wiz.jock.pl/store">The Wiz Store</a> has the broader Agent Builder Pack if you want the whole stack.</p></div><h2>The Extension Half: Capabilities I Keep Off The Primary Stack</h2><p>Insurance is why I opened the account. Extension is what kept me using it.</p><p>The $20 sitting in OpenRouter is not just backup for when the primary breaks. It is also where the agent goes for things I deliberately do not want native in its primary architecture. Anthropic does not generate images, and I do not want another SDK, another auth flow, and another billing line living inside the core agent loop just to support a header image. So image generation lives on the extension side. The agent calls <a href="https://openrouter.ai/google/gemini-2.5-flash-image">Gemini 2.5 Flash Image (Nano Banana)</a> and Nano Banana Pro through the same OpenRouter key when a blog post needs a header or a <a href="https://thoughts.jock.pl/p/familiar-local-ai-agent-mac">Forge prototype</a> needs a mockup. From the agent&#8217;s point of view it is just another dispatch. From my point of view the agent grew an arm.</p><p>Same logic for evals (compare three models on the same prompt without standing up three SDKs), for the long-context work that does not fit cleanly in Claude&#8217;s window, for the cheap reasoning passes I do not want to spend Opus tokens on, and for the open-weight curiosity calls I want to make without standing up a whole new vendor relationship.</p><p>Each one of those is something I could fold into the primary stack. None of them earn that complexity. They live on the extension side instead, on the same $20 that already pays for the insurance half.</p><h2>What I Would Tell You If You Were Starting Today</h2><p>If you have zero fallback wired right now: open the OpenRouter account this week. Wire one fallback call against your existing primary. Test it by killing your primary key for ten minutes and watching the agent respond. The whole thing is one afternoon of work.</p><p>If you already have an agent in production: count how many places in your code have a hard dependency on a single provider&#8217;s SDK. Each one is a future outage you will wear personally. Wrap them.</p><p>If you have been waiting for local llm to be &#8220;good enough&#8221; to be the fallback: local is the cheap layer for narrow work. OpenRouter is the smart layer for everything else. Use both, in that order, for the right kinds of work. The compounding wins from an autonomous system come from <a href="https://thoughts.jock.pl/p/ai-productivity-paradox-wellbeing-agent-age-2026">leverage</a>; the losses come from a stack that has not been stress-tested.</p><p>The $20 is not money I lost. It is insurance against the morning my primary has a bad hour, and an extension cord for the capabilities I deliberately keep off the primary stack. Both halves earn their place on a quiet week, and the loud weeks pay for the quiet ones a hundred times over.</p><p>The agent is going to keep waking up at strange hours and trying to ship work while I am asleep. Some of those mornings the primary will be down. I want the worst line in my error log to stay rare. That is what the insurance half is for. And on the calm mornings, that same $20 lets the agent generate images, run cheap classification passes, and reach for a long-context model when one is needed. Half insurance. Half extension. All of it for the price of a dinner.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/openrouter-fallback-multi-provider-ai-agent-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div class="callout-block" data-callout="true"><p>If any of this resonates: I write everything I learn here, twice a week, free. A <a href="https://thoughts.jock.pl/subscribe">free subscription</a> is the only thing you need for the full picture. The 10% that ends up working long enough to package, like <a href="https://wiz.jock.pl/store/ai-model-switcher">the Model Switcher</a>, lives on <a href="https://wiz.jock.pl/store">the Wiz Store</a> for paid subscribers. Both are fine for me. Both keep me writing.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://wiz.jock.pl/store/&quot;,&quot;text&quot;:&quot;Wiz Store&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://wiz.jock.pl/store/"><span>Wiz Store</span></a></p><p></p><div><hr></div><p><strong>AI Agent Night Shift Playbook</strong></p><p>The $20 I spend on OpenRouter is the cheapest production resilience I know. How to wire it up, when to trigger it, and what not to route through it is one chapter in the Night Shift Playbook. The full fallback architecture, LaunchAgent setup, and safety boundaries are in there.</p><p><strong>$19</strong> at <a href="https://wiz.jock.pl/store/night-shift-playbook">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[I Built a Self-Improving AI Agent. Here Is What Made It Learn.]]></title><description><![CDATA[The corrections loop. Six months in. Here is what actually made my agent learn.]]></description><link>https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent</link><guid isPermaLink="false">https://thoughts.jock.pl/p/i-built-a-self-improving-ai-agent</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Tue, 12 May 2026 11:31:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ndii!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ndii!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ndii!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ndii!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4716082,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/197331319?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ndii!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!Ndii!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d41e233-f4ae-47e8-a137-b9b9fc607698_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The setup, because this only works if the rest of the stack is calm</h2><p>I have been going through more changes on my AI agent recently. I have been transparent about that here as I go, post by post, and today I want to write about the one layer I depend on the most. But I have to start with how I got to the point of being able to ask &#8220;how does my agent actually learn from me?&#8221; That part of the story is a little messy and I think it matters.</p><p>When I started this project in October 2025, the first thing I built for the agent was its own task manager. A control panel, a dashboard. I went deep on it. I built it native on iOS, native on macOS, and as a web app, all wired together. It worked. For two or three months it was genuinely great.</p><p>The problem with self-made software is that you have to maintain it. There is no version of &#8220;I reached a level of polish I was happy with, and then I forgot about it.&#8221; The dashboard needed constant feedback. What should it show? What should it hide? Where was it pulling data from this week that it had not been pulling last week? It was also burning more tokens than I wanted to think about. So I switched. I moved to a small open-source kanban called Fizzy with a thin shim of my own. That was a quieter setup that I held for a while, and I wrote about the move in detail in <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">the post on replacing my custom dashboard</a>.</p><p>Fizzy was good. I was still struggling with one thing though. I needed to be able to orchestrate the agent and also see the projects I was working on from a longer distance. Day-to-day kanban is one job. Stepping back to see what was actually shipping over a month was another. So I made a small personal scratchpad of my own called <a href="https://experiments.jock.pl/">experiments.jock.pl</a>. It is not for everyone, not everything I am working on is on it, but it gave me a place to lay out the experiments I had in motion at a higher altitude than the task list. That helped, but it was still mine to maintain, and I had the same problem I had with the original dashboard.</p><p>What actually solved it was a tool I have used for years and had stopped thinking about. Basecamp. They shipped a dedicated CLI for agents recently, and the whole picture clicked for me. The CLI is what makes the agent side work. The other half of why it clicked, on my side, is the card table inside Basecamp. It is essentially the same clean kanban I liked in Fizzy, but built in. I get the lens I was rebuilding by hand, plus everything else Basecamp does, plus the CLI, all in one place. The agent can read projects, comment on cards, file new ones, complete them, all from the same place I am working. I have tried a lot of pieces of AI infrastructure in the last year and most of them are good enough. This one feels different. Another level, honestly. I can see the whole stack of work at the right altitude. I can move things around. If something is a bigger project I carve out a separate space for it. The board does what I would have spent two more months building for myself, and it does it better.</p><p>This is the setup I have been settling into over the last few weeks. The short version is that I have been replacing my custom software with shims on top of mature tools, and so far the replacements keep winning. I write about why I still keep building most of my own stack in <a href="https://thoughts.jock.pl/p/building-your-own-things-is-cool-too-2026">building your own things is cool too</a>. The corrections loop is one of the things that only became visible once everything else around it had calmed down.</p><div class="callout-block" data-callout="true"><h2>A small commercial in the middle, on theme</h2><p>Speaking of evolution. Yesterday I shipped a fresh round of updates to a bunch of products on the <a href="https://wiz.jock.pl/store">Wiz store</a>. Paid subscribers and buyers should already have an email about it. The agent playbooks, the model switcher pack, the nightshift bundle, a few of the smaller kits, all refreshed. There is also one new kit I will come back to a little later, because it is the bundle for exactly this post. If you have an older version of anything in the store, the new one drops in clean. If you do not have any of them, the store page will tell you what changed in each kit. I am mentioning this here because it is on theme. The point of the rest of this post is that nothing in a working agent stays still for long. The store products move with the stack, because the stack moves with the work.</p></div><h2>What corrections actually look like, when you work with an agent every day</h2><p>OK. On to the actual subject.</p><p>When you work with an agent every day, most of the time you are not writing prompts. You are watching the agent do something and quietly thinking &#8220;no, not quite like that.&#8221; Then you say so. Five words. &#8220;I would not link that.&#8221; &#8220;Use plain text here.&#8221; &#8220;Stop confirming every step.&#8221; Each of those is a correction, and the unspoken contract between you and the agent is that you should not have to say it twice.</p><p>The best systems for this are the ones that catch corrections without you having to do anything special. You correct in chat, in your normal voice, and behind the curtain the system decides &#8220;this is something I should think about for the future,&#8221; files it where it belongs, and makes sure the next session that boots on this machine knows about it. You do not stop and write documentation. You do not open an admin panel. You just keep working, and the agent keeps absorbing.</p><p>That is what I have been building for the last few months. The corrections loop is the part of the agent that decides what to do with the small &#8220;no, not like that&#8221; moments and where to file them so they outlive the session they happened in. It is the layer I depend on the most, because it is the one that makes the agent feel like an actual coworker instead of an autocomplete. It is also the layer that makes the agent slowly start to feel like more of you, rather than more of the model.</p><h2>A quick word on how the agent started</h2><p>For context. My agent started in October 2025. Almost everything about it was rough back then. Sometimes the output came back cold, sometimes it just did the wrong thing in a polite way. I used to write very long prompts to deal with that. I would describe the task, then add a paragraph at the end explaining how I wanted it done, what tone I wanted, where to file the output, what to skip. Every session, over and over. The output was usually good when I did all of that. The cost was that I had to do all of that every time.</p><p>That is not a stable way to work. It scales for the first week and then you get tired of writing the same paragraph again. The thing that quietly changed everything was the agent gathering enough data on me, both from the work we had done together and from the corrections I had made along the way, that the explaining paragraph slowly stopped being necessary. It is still there in some shape. It just lives in files now, not in the prompt window. The agent walks into the room already carrying it.</p><p>The corrections loop is the part of that I want to focus on, because it is the one piece you can copy without copying everything else.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The architecture, in three stages</h2><p>The pipeline is named the way most of my plumbing is named, badly and on purpose. <strong>Capture, classify, graduate.</strong></p><p><strong>Capture.</strong> The moment the agent spots a correction in chat, any session can call a single helper:</p><pre><code>python3 automation/self-improve/correction_capture.py add \
    --text &#8220;&lt;the correction&gt;&#8221; \
    --source cli \
    --context &#8220;&lt;what I did&gt;&#8221;</code></pre><p>That writes one line to a JSONL queue. It also opens a card in Basecamp so I can see the correction landed somewhere and so I can comment on it. No model call. No retries. Capture has to be cheap, or the agent will silently stop doing it under pressure.</p><p><strong>Classify.</strong> The same helper passes the message through a small regex map. Seven patterns mapping to six kinds. The kinds are <code>skill_misuse</code>, <code>memory_update</code>, <code>behavioral</code>, <code>rule</code>, <code>preference</code>, and <code>unknown</code>. &#8220;Stop doing X&#8221; comes out as <code>rule</code>. &#8220;I prefer X&#8221; comes out as <code>preference</code>. &#8220;You used the X tool wrong&#8221; comes out as <code>skill_misuse</code>. Each kind has a default action attached, so the next stage knows what to write. The patterns themselves live in <code>correction_capture.py</code> lines 50 to 92. They are short, and writing them taught me what corrections actually look like at scale better than any post I could read on the topic.</p><p><strong>Graduate.</strong> Every night a separate process drains the queue. For each pending entry, it picks the right place to file the artifact, writes it, and only then marks the entry resolved. The rule, baked into the agent&#8217;s own playbook, is <em>a correction never expires unaddressed.</em> If the nightly drain cannot fully handle one, it has to leave it pending with a note. It is not allowed to silently drop one.</p><p>That last line is the part that took me the longest to actually believe in. Queued things age, in any system. Once one ages enough, the agent stops feeling like it learns and starts feeling like it just covered the easy stuff. Forcing the queue to either drain or escalate is the only way I have found to keep that from happening. The nightly drain is part of a wider <a href="https://thoughts.jock.pl/p/ai-agent-runs-overnight-setup-guide-2026">overnight job loop</a> on my machine. The corrections drain is one of the cleanest jobs in that loop.</p><p>A real recent example. One night Atlas, the agent persona that does research for me, returned a list of hallucinated Reddit thread IDs. None of the URLs resolved. A correction landed in the queue, classified as <code>memory_update</code>. By the next morning there was a new feedback memory file with a single rule attached. <em>Atlas cannot hit reddit.com directly (403). Fetch via Firecrawl or browser-playwright first, then pass verified URLs.</em> Every Atlas-flavored session that has booted since has loaded that line. Same failure has not come back.</p><h2>&#8220;Memory&#8221; is one word covering four jobs</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iz_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iz_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 424w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 848w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 1272w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iz_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png" width="1456" height="1485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:246287,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/197331319?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iz_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 424w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 848w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 1272w, https://substackcdn.com/image/fetch/$s_!iz_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2a8692a-8e1f-4b59-b806-d34ace885557_1906x1944.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here is the part nobody writes about.</p><p>When you start, &#8220;memory&#8221; feels like one thing. You imagine a notebook the agent keeps. You imagine writing into it. You imagine retrieval. That is the abstraction every product page uses, and it is the wrong abstraction. <a href="https://atharvm.medium.com/no-chatgpt-doesnt-remember-you-how-context-windows-fake-memory-cc4b6c944227">Atharv Malve put it cleanly</a> last summer. The model is not really remembering your past messages. It is just seeing the history again, every single time. Once you internalise that, you stop looking for the one memory feature and start asking what you actually need stored, by whom, for how long.</p><p>What I actually needed turned out to be four different sinks. They are not interchangeable. Learning the differences was half the work.</p><p><strong>Sink one is working memory.</strong> Short-lived. The current week&#8217;s plans, the half-finished thoughts, the active conversation context. Lives in a single small file called <code>memory.md</code>. It is supposed to decay. Treating it as durable is the original sin.</p><p><strong>Sink two is lessons.</strong> Full incident logs. When something goes wrong in a way I want a future session to learn from, the lesson lands in <code>lessons.md</code> with the trigger, the root cause, the fix, and a list of keywords. I have 274 lines of these going back to February. They read like engineering postmortems, because that is what they are. The public version of this file is roughly <a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">the mistakes anthology I wrote last month</a>.</p><p><strong>Sink three is feedback memories.</strong> Per-rule files in a durable memory directory. Each one is a single rule with a &#8220;Why&#8221; line and a &#8220;How to apply&#8221; line. Linkable, deletable, deduped. When the same correction comes up twice, the second time it gets its own file. It also gets a tiny pointer in a master index that the agent always loads on startup. Two-level indirection, so the index stays small.</p><p><strong>Sink four is rule lines in the always-loaded index.</strong> These are the ones I wake up next to. A handful of <code>**RULE: ...**</code> lines at the top of the master index, all caps, the smallest set of behaviors I refuse to relitigate. &#8220;Verify deliverables. Show proof or keep task open.&#8221; &#8220;Match work topics against existing WizBoard tasks and complete them when done.&#8221; A rule earns its place at this level only after it has come back more than once.</p><p>And then there is the sink I did not plan for and would not give up now. <strong>The Behavioral Learning card table inside Basecamp.</strong> My WizBoard project has a small card table on it called Behavioral Learning, and every single correction the agent captures lands there as its own card. I can read the card, push back on it, fold two cards into one, or trash one that is wrong. Corrections become reviewable, not silent. That part matters. I will say more about why in the next section, but the short version is that if you let the model grade its own corrections in private, you have already lost.</p><p>If you take one thing from this post, take this. &#8220;Memory&#8221; as one concept is the wrong abstraction. Build sinks for different lifespans. Working memory is fast and disposable. Lessons are slow and durable. Feedback memories are searchable rules. Top-level rules are non-negotiable. The Behavioral Learning card table is the human-in-the-loop that keeps the rest honest. Different jobs, different files, different decay curves. I wrote about how rules in particular shape an agent in <a href="https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5">the bounded agent</a>. Most of &#8220;memory,&#8221; once you look at it long enough, turns out to be rules.</p><h2>Does it actually work?</h2><p>Yes and no.</p><p>Here is what I can see in my own metrics. I have an autonomous improver that runs nightly and writes a <code>metrics.json</code> with a seven-day window, a thirty-day window, and a longer view. As of this morning, my agent received 22 corrections in the last 30 days. In the last 7, that number is 18. The trend line is down, and the system flags it explicitly with <code>valence: good</code>. Errors total across all categories is also drifting down, by less.</p><p>I do not want to present this in a single direction. The task success rate is 93.5 percent over 30 days, with a small dip in the last seven, from 93.5 to 92.6. So I am not going to pretend the picture is clean. Some weeks the agent gets worse. The point is that the corrections themselves are showing up less often, and when they do show up they are landing in places I can act on.</p><p>What the corrections actually look like, beyond the totals, is more interesting. A separate analyzer scans the captured corrections for repeating themes. As of this morning it has flagged two. One it calls <code>incomplete</code>, which is me catching the agent finishing a task that was not fully done. The other it calls <code>repeated_mistake</code>, which is a fix that came back. The analyzer is also allowed to propose a new rule when it finds a theme strong enough, and both of the rules it proposed have already graduated to the top-level RULE lines I quoted earlier. &#8220;Verify deliverables. Show proof or keep task open.&#8221; came out of the incomplete pattern. &#8220;ESCALATE if same mistake recurred. Strengthen the rule or fix the trigger.&#8221; came out of the repeated_mistake pattern. That is the loop closing on itself, in real data I can read off the file.</p><p>One more honest note. Thirty days of declining corrections is not proof of generalization. It is a trend on one user, on one workload, on one machine. The agent could be getting quieter rather than smarter. The way I keep myself honest about that is the Behavioral Learning card table I described. I see every correction. I can see which kinds keep coming back. The bar I am holding myself to is &#8220;fewer repeats of the same mistake,&#8221; not &#8220;an agent that never breaks.&#8221; On that narrower bar, the data is encouraging.</p><p>Measurement of this kind is also why I cared so much about token cost a few weeks ago. If you cannot count what your agent is doing, you cannot tell whether it is improving or just drifting. I wrote about that in <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">the post on token waste on Opus 4.7</a>. Same instinct, different file.</p><div class="callout-block" data-callout="true"><p><em>This is the new kit I mentioned earlier.</em> My paid subscribers can already grab the <a href="https://wiz.jock.pl/store/behavioral-learning-kit/">Behavioral Learning Kit</a> on the Wiz store. It is the architecture I just walked through, packaged. The actual <code>correction_capture.py</code> and <code>correction_graduator.py</code>, the four memory-sink templates, the Basecamp card-table playbook (adaptable to Linear, Notion, or Trello), a CLAUDE.md snippet for agent integration, and a setup script that wires the rest together. Free with a yearly subscription. Included in the one-free-product-per-month allowance for monthly subscribers. $29 standalone if neither of those is you.</p></div><h2>What would break it, and what I would build next</h2><p>The fragile part is the classifier. Seven regex patterns is enough to label most corrections, but <code>unknown</code> still shows up too often to ignore. When an entry lands as <code>unknown</code>, the nightly drain picks it up, but the action it should take is no longer automatic. The fallback is that I or one of my future sessions has to retag the row by hand. Replacing the regex with a small LLM call would solve the labeling problem and create two new ones. Latency and cost. It would also create a softer problem, which is the one that scares me more.</p><p>If you let the model grade its own corrections in private, you get an agent that learns the wrong lessons confidently. Yohei Nakajima wrote about this risk in <a href="https://yoheinakajima.com/better-ways-to-build-self-improving-ai-agents/">his note on better ways to build self-improving agents</a>. His phrasing is the one I keep coming back to. The model can hallucinate bad reflections and reinforce them. That is the failure mode for any self-improving loop. The Behavioral Learning card table is what keeps that from happening on my setup, and it is the part I would build first if I were building this for someone else.</p><p>There is one more bigger picture thing. The reason I can talk about this layer with confidence is the architecture it sits inside. I wrote the long version in <a href="https://thoughts.jock.pl/p/wiz-ai-agent-self-improvement-architecture">my AI agent knows who I am</a> earlier this year, where I walked through the ten layers I use to make the agent feel coherent over time. The corrections loop is one of those layers. It is the one I depend on most, because it is the one that most directly changes the agent&#8217;s behaviour rather than its memory.</p><p>If you have not yet noticed one of your own fixes coming back at you, you will. When you do, the move is not to add more memory. It is to build a small queue, decide what your sinks are, make sure no correction can quietly age, and then put a human-readable surface on top of it so you can see what the agent is teaching itself. The agent that comes out the other side of all that does not just remember more. It starts behaving like more of you.</p><div class="callout-block" data-callout="true"><p>The free subscription gets you every build log on this stack, including the next one. The store has the small bundles for people who would rather skip a few of the walls I walked into, and as I mentioned earlier most of those bundles were just refreshed. Both are fine for me.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p><div><hr></div><p><strong>Self-Improving Agent Kit</strong></p><p>The corrections loop, lesson archival, and self-heal architecture from this post are packaged as the Self-Improving Agent Kit. It includes the exact files Wiz uses, plus a setup guide for wiring it to your own agent without having to build all this from scratch.</p><p><strong>$49</strong> at <a href="https://wiz.jock.pl/store/self-improving-agent">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[How to Use Git(hub) When You’re Building with AI (Basics)]]></title><description><![CDATA[The checkpoint system that makes building with AI agents actually survivable.]]></description><link>https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Thu, 07 May 2026 09:37:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hd2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hd2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hd2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hd2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png" width="1024" height="1024" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1459913,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/196757127?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hd2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!hd2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfc943b8-0185-4c76-b8ef-85c094f32094_1024x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="callout-block" data-callout="true"><p><strong>This is part three of my Basics series.</strong> The first post was about <a href="https://thoughts.jock.pl/p/how-i-structure-claude-md-after-1000-sessions">how I structure CLAUDE.md after 1,000+ sessions</a>, the instructions file that tells your AI agent who it is and how to behave. The second was <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">a step-by-step guide to building your first AI agent from scratch</a>. This one covers something I probably should have put first: version control. Why you need it, what it actually is, and how to use it when AI is doing some of the building.</p></div><p>If you&#8217;ve ever lost an hour of progress in a game because you forgot to save, you already understand why Git exists.</p><p>You&#8217;re deep in a dungeon. The boss took 40 minutes. You made one wrong move, got killed, and your last save was way back at the start of the level. That hour is just gone. No trace of what you tried, no checkpoint to return to, nothing.</p><p>Building software without version control feels exactly the same. Especially when AI is part of the building process.</p><p>I&#8217;ve been running my own AI agent since late 2025. It builds things, makes decisions, modifies files, runs overnight. It also makes mistakes. Sometimes it introduces a bug deep in the architecture and I wake up to something that doesn&#8217;t work anymore. Without proper commits, I&#8217;d have no idea what changed. With them, I open the history, read back through what happened, and roll back to the last clean state in under a minute.</p><p>This post is for people who are starting to build with AI tools, vibe coding with Cursor or Claude Code or Codex, or running their first experiments with autonomous agents. Git probably sounds like a developer thing. It is. It&#8217;s also one of the most useful habits you can build as a builder of anything, regardless of how technical you are.</p><div><hr></div><h2>First: Git is not GitHub</h2><p>This confusion trips up almost everyone who starts. I had it for longer than I want to admit.</p><p><strong>Git</strong> is a tool. Software you install on your computer. It tracks changes to your files over time and saves snapshots of your project whenever you ask it to. It&#8217;s free, open source, and runs entirely on your machine. It has nothing to do with the internet. Git was created in 2005 by Linus Torvalds (the person who also created Linux) and has become the standard for version control across the entire software industry.</p><p><strong><a href="https://github.com/">GitHub</a></strong> is a website. A cloud service that stores your Git repositories remotely. A place to back them up, share them with others, and access them from anywhere. GitHub is owned by Microsoft and is where most public open-source code lives.</p><p>The relationship is like the difference between a text file and Google Drive. The file exists on your machine whether or not you upload it anywhere. Git works whether or not you ever create a GitHub account.</p><p>Why does this matter? Because GitHub is not your only option, and I think a lot of people avoid the whole topic because they assume it means signing up for something owned by Microsoft and making their work public. Neither of those things has to be true.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pvOI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pvOI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 424w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 848w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 1272w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pvOI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png" width="1456" height="1076" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1076,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:454249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/196757127?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pvOI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 424w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 848w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 1272w, https://substackcdn.com/image/fetch/$s_!pvOI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80f1be5a-eb06-4bb4-8c88-54d071bf1b50_2801x2070.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The main alternatives worth knowing:</p><ul><li><p><strong><a href="https://gitlab.com/">GitLab</a></strong>: the most comprehensive alternative. Does everything GitHub does (repositories, issue tracking, code review) plus built-in CI/CD pipelines for automated testing and deployment. Can also be self-hosted on your own server if you want full control. Good option if you want more features baked in.</p></li><li><p><strong><a href="https://codeberg.org/">Codeberg</a></strong>: run by a nonprofit organization based in Germany. GDPR-native from the ground up, no data selling, and they explicitly don&#8217;t train AI models on your code. Free, donation-funded, no ads, no tracking. If privacy and data sovereignty matter to you (especially if you&#8217;re in Europe), this is the serious alternative.</p></li><li><p><strong><a href="https://forgejo.org/">Forgejo</a></strong>: open-source and self-hosted. You install it on your own server and run your own Git hosting. Lightweight, modern interface, GitHub-compatible. If you want complete control over your code and have a machine to run it on, this is the path.</p></li><li><p><strong><a href="https://bitbucket.org/">Bitbucket</a></strong>: made by Atlassian, integrates tightly with Jira and Confluence. If your team is already using those tools, Bitbucket fits naturally.</p></li></ul><p>All of these speak the same Git language. Every command I&#8217;ll show you in this post works on all of them. The choice of platform is about where your code lives, not how you use it.</p><p>I use GitHub because the ecosystem is built around it and my AI tools (Claude Code especially) integrate with it well. But if you have strong reasons to go elsewhere, you&#8217;re not missing anything technically.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">This is part of the Basics series on building with AI. If this kind of practical, in-the-weeds content is useful to you, Digital Thoughts is where I publish more of it every week. No hype, just what I'm actually building and learning. Free to subscribe.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why I started actually caring about this</h2><p>I&#8217;ve known about Git for years. I ran commits occasionally. I wasn&#8217;t disciplined about it.</p><p>That changed when I started <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">building an agent that runs overnight</a>.</p><p>The setup is that the agent works autonomously while I sleep. It builds features, writes scripts, modifies configuration, creates tasks for itself. Most nights this is productive. But early on, I&#8217;d wake up to something broken and have no clear way to understand what had changed. The agent had touched 12 files across 3 directories and something downstream was misbehaving. I was staring at a broken system with no map back to working.</p><p>I fixed this by building commit discipline into the agent. It now commits after every meaningful action. When I wake up and something is wrong, I read the commit history. I see exactly what changed, when, and in what order. I can roll back to the last clean commit in under ten seconds, or read forward through the commits to understand what went wrong and patch it with that knowledge.</p><p>This is what most people miss when they think of version control as &#8220;backup.&#8221; It&#8217;s not just backup. It&#8217;s a navigable history. It&#8217;s the difference between saving a file and saving a timeline. With a timeline, mistakes become investigations instead of disasters. I wrote about a lot of those investigations in <a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">the post about how I almost broke everything</a>.</p><div><hr></div><h2>Setting up your first repository</h2><p>This will take less time than you think. Let me walk through exactly what to do.</p><h3>Step 1: Install Git</h3><p>On a Mac, open the Terminal app (search for it in Spotlight) and type:</p><pre><code><code>git --version</code></code></pre><p>If you see something like <code>git version 2.39.0</code>, you already have it. If not, the easiest path is to go to <a href="https://git-scm.com/">git-scm.com</a> and download the installer. On Mac you can also run <code>brew install git</code> if you have Homebrew installed.</p><p>On Windows, download the installer from <a href="https://git-scm.com/">git-scm.com</a>. It includes a terminal called Git Bash, which is what you&#8217;ll use to run the commands below.</p><h3>Step 2: Tell Git who you are (one-time setup)</h3><p>Git tracks who made each change. Before you do anything, set your name and email:</p><pre><code><code>git config --global user.name "Your Name"
git config --global user.email "you@example.com"</code></code></pre><p>You only do this once. It doesn&#8217;t create an account anywhere. It just labels your commits.</p><h3>Step 3: Initialize a repository</h3><p>Navigate to your project folder in the terminal and run:</p><pre><code><code>git init</code></code></pre><p>Git creates a hidden folder called <code>.git</code> inside your project. That folder is the entire history of your project. All your commits, all the metadata, everything. You never need to open or touch it directly. Your project is now being tracked.</p><p>If you want to verify it worked, run <code>git status</code>. You&#8217;ll see a list of your files as &#8220;untracked&#8221; (Git sees them but hasn&#8217;t started tracking their history yet).</p><h3>Step 4: Make your first commit</h3><p>A commit is a snapshot, your first save point. Two commands:</p><pre><code><code>git add .
git commit -m "Initial setup"</code></code></pre><p><code>git add .</code> stages all your files, which means &#8220;include these in the next snapshot.&#8221; The dot means &#8220;everything in this folder.&#8221; You can also add specific files with <code>git add filename.py</code> if you only want to commit some changes.</p><p><code>git commit -m "message"</code> saves the snapshot with your description. That description is the commit message. We&#8217;ll talk about what makes a good one in a moment.</p><p>To confirm it worked, run <code>git log</code>. You&#8217;ll see your first commit listed with a timestamp and your name.</p><h3>Step 5: Push to a remote host (optional but recommended)</h3><p>Your repository exists on your machine right now. To back it up to GitHub (or wherever), you need to create an empty repository there first, then connect your local one to it.</p><p>On GitHub: click the &#8220;+&#8221; icon at the top right, choose &#8220;New repository,&#8221; give it a name, and make sure you do NOT check &#8220;Add a README&#8221; (you want the empty repository). Copy the URL it gives you.</p><p>Then run these two commands:</p><pre><code><code>git remote add origin https://github.com/yourusername/your-repo.git
git push -u origin main</code></code></pre><p><code>git remote add origin</code> tells your local Git where the remote copy lives. <code>git push -u origin main</code> uploads your commits there. The <code>-u</code> flag sets this as the default remote for future pushes, so after this first time you just run <code>git push</code>.</p><p>That&#8217;s the whole setup. From here, your workflow is: make changes, add, commit, push. Those three steps are 90% of what you&#8217;ll do.</p><div><hr></div><h2>What to add to .gitignore (and why)</h2><p>Before you commit your actual project files, you need to talk about <code>.gitignore</code>.</p><p>This is a file that tells Git which files and folders to never track. You don&#8217;t want passwords, API keys, or large auto-generated files in your version history. Once something is committed to Git and pushed to a remote, it&#8217;s there forever (even if you delete it later, it&#8217;s in the history). So you exclude sensitive things upfront.</p><p>Create a file called <code>.gitignore</code> in your project root. For most AI agent projects, this is a good starting point:</p><pre><code><code># Environment variables and secrets
.env
.env.local
secrets/
*.key

# Python
__pycache__/
*.pyc
*.pyo
.venv/
venv/

# Node.js
node_modules/
npm-debug.log

# macOS
.DS_Store

# Editor files
.vscode/settings.json
.idea/

# Large generated files
*.log
dist/
build/</code></code></pre><p>The most important lines: <code>.env</code> and anything in a <code>secrets/</code> folder. If you&#8217;re using AI tools like Claude Code, you likely have API keys stored somewhere. Those should never go into Git. Add them to <code>.gitignore</code> before your first commit.</p><p>If you accidentally commit a secret and push it: change the key immediately. The history is visible even after deletion.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is reader-supported. Posts like this, and the Basics series they're part of, keep going because readers show up. If you're getting value from the building content here, subscribing is the best way to stay in the loop. Free gets you the weekly posts. Paid subscribers get full access to the store.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>When to commit</h2><p>Most beginners commit too rarely. They work for three hours, then push &#8220;made some changes.&#8221; That&#8217;s nearly useless as a history. Here&#8217;s how I actually think about it.</p><p><strong>Commit before anything big.</strong> If you&#8217;re about to let Claude Code refactor a major section of your project, commit first. If the refactor goes sideways, you can undo the whole thing with one command: <code>git reset --hard HEAD</code>. This is the most valuable habit I&#8217;ve developed. Before I hand something big to the agent, I save my current state. No exceptions.</p><p><strong>Commit after anything that works.</strong> Feature works? Commit. Bug fixed? Commit. Even small wins. Each commit is a checkpoint you can return to. There is no such thing as committing too often.</p><p><strong>Commit with meaning.</strong> This is where most people lose the value of their history. A commit message is documentation. &#8220;Fixed auth bug where tokens expired before session timeout&#8221; is infinitely more useful than &#8220;fixes.&#8221; When you&#8217;re debugging something three weeks later, whether it&#8217;s you, someone else, or an AI agent reading the log, those messages are what makes the history useful instead of just a list of timestamps.</p><p>A simple format that works well:</p><pre><code><code># Good commit messages
git commit -m "add rate limit guard to external API calls"
git commit -m "fix memory compression when context exceeds 200 lines"
git commit -m "checkpoint before refactoring auth flow"

# Less useful
git commit -m "updates"
git commit -m "wip"
git commit -m "stuff"</code></code></pre><p><strong>Commit before you sleep.</strong> If your agent runs overnight, give it a clean starting point. Whatever state your project is in when you go to bed, commit it. If something goes wrong at 3am, the history starts from a known point.</p><p>On active agent architecture work, I commit every 15 to 30 minutes of real progress. Some sessions have 20 commits. This is not excessive. The checkpoints are frequent enough that no single mistake costs more than a few minutes of work.</p><div><hr></div><h2>Reading the history</h2><p>Knowing how to read your commit history is as important as knowing how to write it. These are the commands I use most:</p><pre><code><code># See all commits, newest first
git log

# More compact view (one line per commit)
git log --oneline

# See what actually changed in the last commit
git show HEAD

# See what changed between two commits
git diff abc1234 def5678

# See which files changed in a commit
git show --stat abc1234</code></code></pre><p>When Claude Code starts a debug session on my project, one of its first moves is <code>git log --oneline</code>. It reads back through the recent commits to understand the context: what was built, when, and why things changed. This is the moment where good commit messages pay off. If the last ten commits say &#8220;add rate limit guard,&#8221; &#8220;fix memory compression,&#8221; and &#8220;checkpoint before auth refactor,&#8221; the agent can quickly build a mental model of recent work. If they all say &#8220;wip,&#8221; it&#8217;s starting from zero.</p><p>You can also browse your commit history on GitHub&#8217;s web interface if you&#8217;ve pushed your code. Go to your repository and click &#8220;N commits&#8221; at the top of the file list. Each commit shows you the message, the author, the timestamp, and a full diff of what changed. This is genuinely useful for non-technical team members who don&#8217;t use the terminal.</p><div><hr></div><h2>Private vs. public: my 90/10 approach</h2><p>About 90 percent of my repos are private. I want to address this directly because I&#8217;ve seen people feel guilty about keeping their work closed.</p><p>Private doesn&#8217;t mean hiding. Most of my private repos are private because the work is genuinely messy. Unfinished. Half-ideas with rough code that works but embarrassingly so. Agent architecture that&#8217;s in constant flux. Projects I&#8217;m building toward something but haven&#8217;t figured out what yet.</p><p>This is normal work. Version control is for you in this context. You get all the benefits: the history, the rollbacks, the tracking. You don&#8217;t owe anyone visibility into your process while you&#8217;re still figuring things out.</p><p>The public repos are things I&#8217;m actually proud of or that other people can genuinely use. The one I keep pointing at is the <a href="https://github.com/joozio/agent-wellbeing-kit">Agent Wellbeing Kit</a>, boundaries and nudges for AI agents and their humans. It has eight stars, which I find quietly satisfying. It&#8217;s there because I built something clean enough that it adds value for others. That&#8217;s the standard I hold public work to.</p><p>Contribute when you can. But don&#8217;t let the idea that &#8220;real developers make everything public&#8221; stop you from using version control privately. Most professional work is private. Most early work is messy. Both are fine.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share&quot;,&quot;text&quot;:&quot;Share Digital Thoughts&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/?utm_source=substack&amp;utm_medium=email&amp;utm_content=share&amp;action=share"><span>Share Digital Thoughts</span></a></p><div><hr></div><h2>Working alone vs. with others</h2><p>The workflow changes meaningfully depending on whether you&#8217;re solo or in a team. Worth understanding both even if you&#8217;re only doing one right now.</p><h3>Working alone</h3><p>When you&#8217;re the only person on a project, the simplest workflow is pushing directly to main. There&#8217;s no one else whose changes could conflict with yours. Commit often, push regularly. That&#8217;s enough.</p><p>I sometimes create branches when I&#8217;m testing a bigger experiment. A branch is just a separate line of development that doesn&#8217;t affect main until you merge it back. To create one:</p><pre><code><code># Create a new branch and switch to it
git checkout -b experiment-new-memory-system

# Do your work, commit normally
git add .
git commit -m "try new memory compression approach"

# If it works: merge it back to main
git checkout main
git merge experiment-new-memory-system

# If it doesn't: just delete it, no harm done
git branch -D experiment-new-memory-system</code></code></pre><p>The branch approach is especially useful when you&#8217;re handing off an experiment to an AI agent. You give the agent a branch to work on, let it build and commit freely, then review what it built before merging to main. Clean separation between &#8220;work in progress&#8221; and &#8220;known good.&#8221;</p><h3>Working with others</h3><p>With a team, branches and pull requests become mandatory. No one pushes directly to main. Here&#8217;s the standard flow:</p><ol><li><p>Create a branch for your feature or fix</p></li><li><p>Do the work and commit to that branch</p></li><li><p>Push the branch to GitHub: <code>git push origin your-branch-name</code></p></li><li><p>Open a Pull Request on GitHub, a formal request to merge your branch into main</p></li><li><p>Someone else reviews it, leaves comments, approves</p></li><li><p>Merge to main</p></li></ol><p>The PR review step is what protects main from broken code. It&#8217;s also where the real collaboration happens: someone might catch a bug you missed, suggest a better approach, or just ask a clarifying question about what the code is doing.</p><p>Even when I&#8217;m working solo on a bigger feature, I&#8217;ve started creating PRs for myself. The description field becomes documentation: why this was built, what problem it solves, what I considered and rejected. That context is genuinely useful six weeks later when I&#8217;m trying to understand a decision I made. And when an AI agent reads your repo to understand what to do next, a well-written PR description gives it context the commit message doesn&#8217;t.</p><div><hr></div><h2>Worktrees: the unlock for AI agent builders</h2><p>This section is for people who are already running AI agents and want to understand the next level. Skip it if you&#8217;re still on step one; you can come back.</p><p>When I&#8217;m working with multiple agents in parallel (which happens when you&#8217;re building complex things), there are sometimes three or four branches active at once. One agent is building a feature. Another is fixing a bug. If I had to constantly switch the entire project directory between branches, I&#8217;d lose context constantly.</p><p>Git worktrees solve this. A worktree is a separate folder on your machine that&#8217;s linked to the same repository but checked out to a different branch. They share the same history and <code>.git</code> folder, but each has its own working directory and independent state.</p><pre><code><code># Create a new worktree for a feature branch
git worktree add ../feature-auth -b feature/auth main

# See all your active worktrees
git worktree list

# Clean up when done
git worktree remove ../feature-auth</code></code></pre><p>With worktrees, I can run two Claude Code instances at the same time: one in <code>~/my-project</code> (main work), one in <code>~/feature-auth</code> (isolated branch). Each agent commits to its own branch with zero interference. I merge when each piece is done.</p><p>This is the infrastructure behind parallel agent builds. I covered how I evaluated different AI coding tools for this kind of work in <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">my comparison of Claude Code, Codex, Aider, and the others</a>. Worktrees are the underlying mechanism that makes it all clean.</p><div><hr></div><h2>AI agents read your commit history</h2><p>This is the piece I didn&#8217;t anticipate, and it&#8217;s changed how I write commit messages.</p><p>When Claude Code starts a session on my project, one of its first actions is reading repository context: the file structure, the current state, and often the recent commits. A history with meaningful messages gives the agent a map of what happened and why. A history full of &#8220;wip&#8221; and &#8220;checkpoint&#8221; entries tells it almost nothing useful.</p><p>This plays out concretely when something breaks. When I start a debug session after my agent did something unexpected overnight, Claude Code often goes to <code>git log</code> as an early move. It reads through the last 10-15 commits. If those commits say things like &#8220;add rate-limit guard to external API calls&#8221; or &#8220;fix memory compression when context exceeds 200 lines,&#8221; it can quickly narrow down what might have changed. If they all say &#8220;wip,&#8221; it&#8217;s starting from scratch every time.</p><p>The same is true when the agent is building something new. Reading recent commits helps it understand the patterns and conventions you&#8217;ve been using: how you name things, how you structure files, what you&#8217;ve already tried. Good history accelerates the agent&#8217;s work. Messy history slows it down.</p><p>I think about every commit message as a note to a future debugger who has no other context. That debugger might be me, might be someone else, might be an AI agent. All three benefit from the same thing: specific, honest context about what changed and why.</p><p>If you want to go deeper on what that looks like at the architecture level, the post on <a href="https://thoughts.jock.pl/p/ai-agent-self-extending-self-fixing-wiz-rebuild-technical-deep-dive-2026">when my AI agent started fixing itself</a> gets into how the commit trail feeds back into the agent&#8217;s own understanding of its own codebase.</p><div><hr></div><h2>The commands you&#8217;ll use 90% of the time</h2><pre><code><code>git init                       # Start tracking a folder
git status                     # See what changed since last commit
git add .                      # Stage all changes
git add filename.py            # Stage one specific file
git commit -m "message"        # Save a snapshot
git push                       # Upload to remote
git pull                       # Download from remote
git log                        # See commit history
git log --oneline              # Compact history view
git diff                       # See exactly what changed (unstaged)
git diff --staged              # See what's staged for next commit
git show HEAD                  # See the most recent commit in detail
git checkout -b branch-name    # Create and switch to new branch
git checkout main              # Switch back to main
git merge branch-name          # Merge branch into current branch
git branch -D branch-name      # Delete a branch
git reset --hard HEAD          # Undo all uncommitted changes (careful)
git reset --hard HEAD~1        # Undo last commit AND its changes (careful)
git revert HEAD                # Undo last commit but keep the history</code></code></pre><p>The difference between <code>reset --hard</code> and <code>revert</code>: reset rewrites history (dangerous if you&#8217;ve already pushed), revert creates a new commit that undoes the previous one (safe always). When in doubt, use revert.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pTNg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pTNg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 424w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 848w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pTNg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png" width="1456" height="1230" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1230,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:562724,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/196757127?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pTNg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 424w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 848w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 1272w, https://substackcdn.com/image/fetch/$s_!pTNg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d665f85-0281-4590-881f-e07e11e50e5a_2664x2250.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>If you&#8217;re using Claude Code, it handles most of these automatically. You can also just say &#8220;commit these changes with a meaningful message&#8221; and it will. But knowing what the commands do means you can read the agent&#8217;s actions instead of just watching them happen.</p><div><hr></div><h2>The thing I keep telling people</h2><p>Git has a real learning curve at the start. I&#8217;m not going to pretend otherwise. The mental model doesn&#8217;t click immediately. You&#8217;ll push the wrong thing. You&#8217;ll get confused about branches. You&#8217;ll probably hit a merge conflict at some point and spend an hour untangling it.</p><p>A merge conflict happens when two different versions of the same file need to be combined and Git can&#8217;t figure out which change to keep. It looks scary. It&#8217;s not. Git marks the conflicting lines in the file, you open it, decide which version is correct, delete the conflict markers, and commit. Takes five minutes once you&#8217;ve seen it once.</p><p>The place where Git changes everything is exactly when things go wrong. The first time your AI agent does something unexpected and you roll back to a known-good state in ten seconds, you&#8217;ll understand what all of this was for. Everything I&#8217;ve been building, from the <a href="https://thoughts.jock.pl/p/my-ai-agent-works-night-shifts-builds">overnight agent</a> to the various <a href="https://thoughts.jock.pl/p/directed-ai-experiments-vibe-business">AI building experiments</a> that broke in interesting ways, was only recoverable because of this.</p><p>Without version control you&#8217;re genuinely going in the dark. The mistakes are unrecoverable. The context is lost. With it, you can make more mistakes, faster, with more confidence, because you know you can always find your way back.</p><p>Make more mistakes. Just make them trackable.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/how-to-use-github-ai-builders-basics-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><div class="callout-block" data-callout="true"><p><strong>Want to go deeper on building with AI?</strong></p><p>If you&#8217;re setting up your first agent or trying to make Claude Code do serious work, I put together an <strong><a href="https://wiz.jock.pl/store">Agent Builder Pack</a></strong> with the actual configuration files, CLAUDE.md templates, and setup guides behind how mine works. The Git workflow above is baked into all of it.</p><p><strong>Free for paid Digital Thoughts subscribers.</strong> Available at <a href="https://wiz.jock.pl/store">wiz.jock.pl/store</a>.</p></div><div><hr></div><p><strong>AI Agent Blueprint</strong></p><p>Git is step one. The AI Agent Blueprint covers what comes next: wake scripts, memory architecture, bounded safety, and the patterns that make an agent actually reliable overnight. One command to set up, 15 minutes to a working agent.</p><p><strong>$39</strong> at <a href="https://wiz.jock.pl/store/ai-agent-blueprint">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[Building Your Own Things Is Cool Too]]></title><description><![CDATA[A couple years ago I wrote a small post called &#8220;starting things is cool&#8221;. This is the longer answer to why I keep building those things myself, even when easier options are right there.]]></description><link>https://thoughts.jock.pl/p/building-your-own-things-is-cool-too-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/building-your-own-things-is-cool-too-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Mon, 04 May 2026 13:18:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!unLh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!unLh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!unLh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!unLh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!unLh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!unLh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!unLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6069029,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/196419129?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!unLh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!unLh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!unLh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!unLh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70a342c7-ad99-4242-abfb-15da060d4731_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>People ask me a version of the same question all the time. &#8220;Why are you spending your evenings building your own thing? There is already a tool that does this. There is already a framework. There is already a whole product. Why are you doing it again?&#8221;</p><p>I get this with my AI agent. I got it about my store. I got it years ago about my podcasts, about the side apps, about the marketing experiments. The question is fair. The honest answer has been the same for a while now, and I have not written it down properly until today.</p><p>I build my own things because I learn through process, not by reading. That is the short version. The longer version is the rest of this post.</p><h2>Quick frame before I go further</h2><p>Everything I write about on this blog is something I am actually doing, experimenting on, or testing. The agent on my Mac Mini. The store. Project Money. The smaller experiments inside both. None of it is &#8220;what I think someone should do.&#8221; It is whatever I am running this week, and what I learned the hard way last week.</p><p>The blog is the slow visible slice of that work. Most of what I am doing in any given week never makes it into a post, because I would have to publish almost every day, sometimes twice, to actually keep up. That is not feasible, so most of the work stays unwritten. The writing here is always trailing the doing, on purpose. That asymmetry matters for the rest of this post. I write about building because I am doing the building. The other way around does not interest me.</p><h2>Going back to &#8220;starting things is cool&#8221;</h2><p>Almost two years ago, when I was finding my way back into writing, I posted something called <a href="https://thoughts.jock.pl/p/starting-things-is-cool">&#8220;starting things is cool&#8221;</a>. It is short, a little messy, and some of the projects it mentions are no longer alive. The Suggestions App I was so excited about that summer is not even an app anymore. A handful of the things I shouted about back then have quietly disappeared.</p><p>The post is also the thing that restarted my writing. Most of what I have built since then traces back to it.</p><p>If you read it today, you will see a sentence underneath everything. <em>I like starting things.</em> That part is true. It is also incomplete. The piece that was implicit in 2024 and has become explicit since is that I prefer to start them <em>myself</em> rather than start by adopting someone else&#8217;s start. Starting and building are the same instinct from two angles. This post is the second angle.</p><h2>The way I actually learn</h2><p>Here is the part I do not usually lead with, because it sounds personal in a way other people do not always relate to. I am not the kind of person who learns by reading. I have tried. I genuinely envy people who can read a book on something complicated and walk away with a working mental model. That is not how my brain works. I learn through process. I have to do the thing. I have to see what breaks. I have to fix it badly, then less badly, then properly. After enough rounds of that, I actually know it.</p><p>I think about this the same way I think about <a href="https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026">how my brain handles ADHD</a>. The shortcut that works for a different kind of mind is not the shortcut that works for mine. So I stopped fighting it.</p><p>Every pre-built tool, framework, or product is a map of someone else&#8217;s process. The map is real and useful. Walking the route teaches something the map cannot.</p><h2>What you only learn from building</h2><p>The thing I get out of building is harder to put in a sentence. Let me try anyway.</p><p>When you build the thing yourself, you know every variable between the start and the end. You watched each one go in. You watched them connect. You know which one is load-bearing, which one is convenience, and which one only exists because two weeks ago you had a bad afternoon and forgot to clean it up. That knowledge is not glamorous. It is the part that lets you change one small thing and get a meaningfully different outcome later. Without it, you can configure what you bought. With it, you can compose.</p><p>The mistakes are the other half. I have written about a few of the recent ones in <a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">the post about almost frying my Mac Mini</a>. Each one taught me a perspective I would not have read about anywhere else. The pattern goes back further than the agent though. It is the same pattern from the failed apps in 2024. The same pattern from the marketing experiments before that, the podcast that did not last, the small side projects that quietly closed. Mistakes have always been where most of the learning lives for me.</p><p>This is slower than picking up the off-the-shelf option. The difference shows up in what you know afterwards.</p><h2>About not rediscovering America</h2><p>For most of my life, I was told the opposite of all this. Use the tools you are given. Do not reinvent the wheel. In Polish there is a stronger version of that line. <em>Do not rediscover America for the second time.</em> I have heard it more times than I can count. For most of those years, I half believed it.</p><p>I do not believe it anymore. The tools are very good, that part is true. The act of building the thing yourself does something to you that the tool cannot do for you. The tool is a snapshot. The act is a process. I am after the process.</p><h2>An example, since the AI one is fresh</h2><p>Here is one current illustration so this does not stay too abstract. Right now I am building my own AI agent from scratch. People keep pointing me to OpenClaw, which has 347,000 GitHub stars and ships with most of what I am writing myself. They point me to Hermes, open-source and ready to install. They are not wrong. If I dropped my stack tomorrow and installed OpenClaw, my agent would do many of the same things in a fraction of the time.</p><p>I keep building my own anyway. The reason is the same one I have just spent a thousand words on. I want the variables. I want the failures. I want the version of myself that exists on the other side of having built it.</p><p>The same logic applies to my store. There are platforms that would let me run a digital store in an afternoon. I built the bones of mine because the parts I most want to understand are the parts most platforms hide.</p><p>I want to be clear though, I am not religious about it. When I had spent two months building a custom kanban dashboard for the agent and then realized I could do the same job in a 94-line shim on top of an existing tool, I switched. I wrote about that here in <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">the WizBoard pivot post</a>. The rule I now use is simple. Build the parts you need to understand. Use the parts you do not. The trick is being honest with yourself about which parts those actually are.</p><h2>Reading &#8220;starting things is cool&#8221; again, from now</h2><p>I went back and reread the original essay last week, before writing this one. I wanted to see what held up.</p><p>The Suggestions App is gone. A few of the projects I was excited about back then are gone. Some of my predictions about how AI would land in normal life were either wrong or right for the wrong reasons. That part of the essay aged badly.</p><p>What surprised me, reading it again, was how much of the underlying pattern actually held up. Starting things is still cool. The act of starting was the thing that I have leaned on hardest in the year and a half since. Almost everything that has worked for me began with a small thing started against the advice of &#8220;there is already a tool for this.&#8221; The two essays really are the same essay, written from two different points along the same line.</p><p>The part that the older me did not yet have words for is the cost. Starting your own things, and building them yourself instead of inheriting someone else&#8217;s start, takes more. It takes more time. It takes more mental energy. It takes the willingness to look stupid for a while because you are doing something the long way. There are weeks where I am fixing something I broke instead of using something that already worked. That is real. There is no version of building from scratch where you do not break things, sometimes badly, sometimes embarrassingly. I have lost count of how many things in my own setup I broke because I was, like, messing around with my agent too heavily. Although that has cost me a lot of time across the last year, I really do not mind it. It is progress and I accept that.</p><h2>Why I pay the cost anyway</h2><p>The reason I pay that cost is that the result is mine in a way that nothing pre-built is. When something inside it breaks, I can fix it. When I want to change one thing, I know which lever to pull. When I write the next thing, I am writing it from a level of understanding that did not exist before. That compounds. Reading about other people&#8217;s builds does not compound the same way for me. I had to test that, more than once, to actually believe it.</p><p>It might compound for you. We are all wired differently. I just stopped pretending I was wired the way the books wanted me to be.</p><p>The other quiet payoff is what AI does to this gap. Yes, both of us can ask Claude or Codex to fix things. The model does not care which version of the system you started from. The same diff is something you can read, judge, and either accept or push back on, if you understand the architecture. The same diff is something you have to trust if you do not. Both ship code. The result is a different category. Building things myself is how I keep being the version that can read the diff.</p><h2>Where I would actually start if I were starting today</h2><p>If you are at zero today, I would honestly not tell you to write everything from scratch on day one. Use the tool. Use the framework. Use the platform. Ship something. I have a longer beginner&#8217;s walk-through for AI agents specifically in <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">how to build your first AI agent</a>, written for exactly that audience, and the same logic transfers to most things you might want to build.</p><p>Then, after a few weeks, when you actually know what your daily workflow looks like, replace the parts you have decided you want to own. That is the order. Use, then build. Not all at once and not for everything. The work I am doing now in <a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">the compounding part of the agent</a> only became possible after I had spent enough time using bare tools to know what I was missing.</p><div class="callout-block" data-callout="true"><p><em>If you want to skip a few of the walls I have walked into and start from a stack that already runs.</em> The <a href="https://wiz.jock.pl/store/agent-builder-pack">Agent Builder Pack</a> on the Wiz Store is the bundle I recommend most often. It includes the playbooks I run on the same Mac Mini I have been writing about, after the experiments. The model switcher, the rightsized local LLM tier, the night-shift loop, the orchestration patterns that actually compounded for me. That is the &#8220;use&#8221; path for someone who wants to go straight to running. The &#8220;build&#8221; path is everything I have ever published on this blog. Both are fine for me.</p></div><h2>What is next</h2><p>A few more pieces are coming in the next week or two. Some are about the agent. Some are about Project Money, the small store I started a while ago and have not written enough about lately. I have decided that some of the parts of that work, the ones I have been quiet about, are actually the more interesting ones, and I want to share where they have brought me.</p><p>The honest version of &#8220;what is next&#8221; is that there is always more in motion than I get to write about. I would have to post almost every day, sometimes twice, to actually catch up to what I am building and testing. That is not feasible. A lot of it ends up staying inside the work, which is fine, that is the trade. The writing here is just the slowest moving piece of a much bigger thing.</p><p>If you like the kind of writing where someone takes the longer way and tells you what they found there, that is the next stretch. The point of building your own thing is not that the result is always better than what you could have bought. It is that you actually choose what you understand. I keep choosing the same answer.</p><p><em>If this is your kind of thing, a free subscription gets you everything I publish, including the build logs, the mistake posts, and the upcoming Project Money writeups. No catch. The store is the small bundle for people who would rather skip a few of the walls I walked into. The writing is for everyone.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p><div><hr></div><p><strong>AI Agent Blueprint</strong></p><p>If you want to build something that does real work while you sleep, the AI Agent Blueprint is where I would start. It is the starter kit I wish existed when I first set this up. One command, 15 minutes, working agent.</p><p><strong>$39</strong> at <a href="https://wiz.jock.pl/store/ai-agent-blueprint">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[The Bounded AI Agent]]></title><description><![CDATA[Capacity, Not Capability]]></description><link>https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5</link><guid isPermaLink="false">https://thoughts.jock.pl/p/the-bounded-ai-agent-ep5</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Wed, 29 Apr 2026 09:24:44 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/195848602/a9c011b4e4c87e6495ceb238b1097e27.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Wiring the agent into a $5 notes app I cannot stop using, why Opus 4.7 sent me back to ChatGPT Pro at $200 a month, the local-LLM experiment that nearly fried my Mac Mini while I was in the mountains, and what an AI agent actually does to an ADHD brain.</p><p>Source posts:</p><p><a href="https://thoughts.jock.pl/p/antinote-ai-agent-integration-2026">https://thoughts.jock.pl/p/antinote-ai-agent-integration-2026</a></p><p><a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026</a></p><p><a href="https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026">https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026</a></p><p><a href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026">https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026</a></p><div><hr></div><p><strong>AI Agent Night Shift Playbook</strong></p><p>The bounded design I described here is documented step by step in the Night Shift Playbook. Resource caps, autonomy tiers, reversibility gates and how to tune each one for an agent you trust to run unsupervised.</p><p><strong>$19</strong> at <a href="https://wiz.jock.pl/store/night-shift-playbook">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[How to (Almost) Fry Your AI Agent (and Your Mac Mini)]]></title><description><![CDATA[A field report on the local-LLM experiment that almost cooked my Mac, plus a few other recent mistakes that taught me more than the wins did.]]></description><link>https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Tue, 28 Apr 2026 09:11:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!K0Xq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K0Xq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K0Xq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K0Xq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5623830,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195604628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K0Xq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!K0Xq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2da348ca-d28a-4f6d-b6e0-b55ccc8aebc0_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is a different kind of post. I try to be transparent about my mistakes. If I described every one of them, my blog would be 90% mistakes and 10% things that actually worked. So I pick the ones that might help someone else avoid the same wall, or at least find a more interesting wall of their own.</p><div class="callout-block" data-callout="true"><p>Quick note before we start. I share 100% of what I do here, the wins and the failures, and a free subscription is the only thing you ever need to get all of it. The 10% that ended up actually working, the patterns I lean on every day, those I clean up and package as small playbooks on <a href="https://wiz.jock.pl/store">the Wiz Store</a> for paid subscribers. That is the trade. Free gets you the whole story. Paid gets you the parts that survived the experiments. Both are fine for me, both keep this writing alive.</p></div><p>With that out of the way, here is the most recent wall I walked into.</p><h2>The setup, before I broke it</h2><p>Most readers know the shape of my agent stack. It runs on a basic Mac Mini M4 with 16GB of RAM, the way I described in <a href="https://thoughts.jock.pl/p/mac-mini-ai-agent-migration-headless-2026">the migration post</a>. The brain is Claude Code with Opus and Sonnet as the baseline. Recently I added Codex with GPT-5.4 and 5.5 as a second harness, after <a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">Opus 4.7 brought me back to it</a>. As a last-resort and small-job tier, I run local models on the box itself, mostly Qwen 3.5 in 4B and 9B sizes. I had also gotten Qwen 3.5 35B-A3B working under <code>llama.cpp</code> with <code>--mmap</code>, which I wrote about <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">when I first got it running</a>.</p><p>That was the setup. It had been working for months. The agent is a real partner now, not only for work. It runs my research, helps me with experiments, drafts content, handles a lot of small boring loops I no longer want to think about. There is a long track record of small improvements stacking up. Like, real momentum, the kind I described in <a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">The Compounding Agent</a>.</p><h2>The wild idea</h2><p>I had been writing about <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">different agent harnesses</a>, which one fits which job, and I had said I really liked Pi. Pi is a calm, capable harness. If Anthropic ever allowed Claude inside a subscription on other harnesses, I would probably use Pi for parts of this. They do not, and per-API billing kills the math for daily use, so I do not.</p><p>What I did get curious about was making more out of the local models. The model switcher between cloud providers had been working really well. I thought, like, what if I push the local tier the same way? Not just classification and summarization. What if a 35B local model could act <em>like</em> a small Claude Code, picking up small tasks on its own, doing real work, even running a tiny part of the business? A long-running quiet helper, the way I described <a href="https://thoughts.jock.pl/p/how-i-taught-ai-agent-to-think-ep2">teaching the agent to think on its own</a>.</p><p>So I started experimenting. I used Codex as the harness for the test, ran a small loop, gave it a few simple tasks. It worked. In a clean test environment. With nothing else running. That should have been the warning.</p><h2>Where it actually fell apart</h2><p>The 35B model is usable on a 16GB Mac, but only because <code>--mmap</code> keeps most of the weights on SSD and pages them in on demand. That trick is real, but it has a price. The price is constant disk activity during inference. Not a problem when nothing else needs the disk or the CPU. A different story when the Mac Mini is also doing its day job.</p><p>That day job, on a normal day, is full. There is the watcher process for iMessage. There is the Discord bot. There is a launchd daemon for Ollama, another for the LiteLLM bridge, the night-shift loop, the cron jobs, the email queue, the dashboard server. Most of the time none of it is heavy. It just needs its slice when its slice is due.</p><p>Now layer a 35B model on top, kept warm, doing small loops on its own schedule. Every loop pulls expert weights from SSD. Every other process that wants disk has to wait. RAM pressure climbs. Swap activity climbs. Background daemons start missing their windows. Cron jobs run a minute late, then five, then they just fail. The Mac Mini was not, technically, fried. But it kept restarting, on its own, without an error worth logging, which felt close enough.</p><p>Of course, this happened while I was on a weekend trip in the mountains. I had only my iPhone. The first signal was the security automation telling me, calmly, that something was wrong. I logged in remotely a few times to look around, but I could not really untangle it from a phone screen. I came back on Sunday, sat at the actual machine, and started reading.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is all about experiments: AI Agents, Commerce. All Digital. Like reading about things someone did(also wrong)? Consider subscribing:</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What it actually was</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i3_g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i3_g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 424w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 848w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i3_g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png" width="1456" height="580" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:580,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:204843,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195604628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!i3_g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 424w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 848w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 1272w, https://substackcdn.com/image/fetch/$s_!i3_g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93a4cc25-85b7-44f4-a239-40ac1cec4711_2664x1062.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The thing I expected to be the heavy part, the local LLM, was not the heaviest part. The honest answer is that there were three things stacked on top of each other, each invisible if you only looked at one of them.</p><p>The first was the harnesses themselves. Claude Code and Codex run on the cloud, but they do not run <em>only</em> on the cloud. The model lives over there, but the harness lives on your machine. It holds context, watches files, indexes your repo, runs hooks, opens subprocesses, keeps a rolling cache. None of that is free. The Claude Code repo on GitHub has multiple long-running threads about exactly this: <a href="https://github.com/anthropics/claude-code/issues/22968">memory leaks in long sessions</a>, <a href="https://github.com/anthropics/claude-code/issues/19393">100% CPU when idle</a>, processes that <a href="https://github.com/anthropics/claude-code/issues/11122">accumulate and never quite let go</a>. I had two of those harnesses running, sometimes both at once, on a 16GB box. People keep saying &#8220;but the model is in the cloud, so it is free.&#8221; It is not free. The model is free. The local agent layer that talks to it is not.</p><p>The second was GUI activity. The Mac Mini is technically headless most of the time, but parts of the agent need a real desktop session to function: BetterDisplay holding the resolution, AppleScript bridges for Messages and Mail, the occasional vision pass. That whole layer needs a logged-in user, a window server, and a chunk of RAM that you do not see in <code>top</code> until you start looking for it.</p><p>The third was the long tail of small automations doing their thing. Cron jobs every minute, every five minutes, every hour. iMessage watchers. Discord listeners. The night-shift loop. The email queue. Memory consolidation. Health checks. Each one is tiny. None of them, alone, would matter. But the load is not what each of them does on average; it is what they all do together when their schedules collide. Modern Mac Minis are absurdly capable, but the box still has only one disk and one set of CPU cores. Layered enough, even cheap automations starve each other.</p><p><strong>And then, on top of those three, I had asked a 35B local model to act like a third agent. That was the layer that broke the truce.</strong></p><p>The fix was rightsizing. The 35B daemon got booted out of <code>launchctl</code>, the unused weights came off disk (about 24GB reclaimed), and the local routes now point only to Qwen 9B and 4B served by Ollama, which stays inside Metal GPU memory and evicts cleanly on idle. The local layer is alive. It is just not pretending to be Claude anymore.</p><p>Honestly, I am fine with that. I had to test it to know where the line is. The result was a lot of weird state to untangle and a clearer mental model afterward. Local LLMs as preprocessing and as a quiet fallback when the cloud is down: yes, still great. Local LLMs as a third agentic harness on a 16GB box that already has two heavy ones: not on this hardware.</p><p><strong>What I do now.</strong> I treat the Mac Mini&#8217;s resource budget the way I treat the Now list in <a href="https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026">my ADHD post</a>: as a small finite thing that I refuse to silently overdraw. Before adding any new always-on layer, I take a baseline of free RAM, free disk, and idle CPU. If a new layer would push that below my floor under realistic load, it does not go on. The local LLM tier is the most useful when it is the smallest layer in the room, not the loudest.</p><div class="callout-block" data-callout="true"><p><em>Quick aside.</em> If you are reading this thinking &#8220;I would rather skip the wall and start from what worked&#8221;, the rightsized local-LLM stack, the cron and night-shift orchestration, and the model-switcher I keep mentioning all live in <a href="https://wiz.jock.pl/store/agent-builder-pack">the Agent Builder Pack</a>. It is the bundle I recommend most often. Same playbooks I run on this very Mac Mini, after the experiments above. <a href="https://wiz.jock.pl/store/ai-model-switcher">The model switcher</a> is also free for yearly subscribers if that is closer to what you want.</p></div><h2>While we are being honest, a few more from the same month</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gOlP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gOlP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 424w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 848w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 1272w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gOlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png" width="1456" height="1166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1166,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:382311,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195604628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gOlP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 424w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 848w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 1272w, https://substackcdn.com/image/fetch/$s_!gOlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F299167f5-32c5-4d4a-8fc4-142be6c51ba6_2563x2052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/almost-fried-ai-agent-mac-mini-mistakes-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Since I am already in confession mode, here are six other mistakes from the last few weeks that fit the same shape. I have tried to organize them by what kind of failure they actually are. Each one looks small in isolation. Each one taught me something I would not have learned without the failure.</p><h3>Mistake 1. Trust drift: Memory said Gemma. Reality was Qwen.</h3><p>This one started innocently. A few weeks earlier, when Gemma 4 came out, I did a real comparison between Gemma 4 and Qwen 3.5 on the Mac Mini. I ran them on the same triage tasks, the same classification prompts, the same summarization workloads. Gemma was good. For some narrow tasks, like short-text classification with a calmer tone, I actually preferred it. <a href="https://www.maniac.ai/blog/qwen-3-5-vs-gemma-4-benchmarks-by-size">The public benchmarks at the time</a> told a similar story, with Qwen winning more rows in the small classes and Gemma trading blows on certain dense ones.</p><p>So I did the responsible thing. I made the swap. I updated the LiteLLM config, downloaded Gemma weights, pointed the model routes at the new endpoints, ran the smoke tests. The smoke tests passed. I wrote it up. I told my agent&#8217;s memory that the primary local tier was Gemma now. Then life moved on.</p><p>What I had missed, and only saw weeks later during a proper audit, was that the swap had only been partial. The shiny LiteLLM-routed paths got Gemma. But several smaller, older callers still hardcoded Qwen URLs directly: the iMessage triage script, a couple of cron jobs, the embeddings helper, the local-fallback chain. None of them broke. They just kept using Qwen, quietly, while my docs and my memory both insisted I had moved on. The Gemma weights I had downloaded sat on disk for weeks, untouched, 17GB taken out of a 16GB-RAM box&#8217;s already-tight drive, never serving a single token.</p><p>The lesson is unsentimental. Documentation about a system drifts faster than the system itself, and migrations are almost never done when you think they are. The fix is not &#8220;write better docs.&#8221; The fix is two small habits I now keep on every config change.</p><p><strong>What I do now.</strong> First, after any config swap, I grep the entire repo for the old endpoint name and the old model name, not just the file I edited. If anything still references the old thing, the swap is not done. Second, I have a tiny daily audit that walks the live processes, lists which models they actually call, and compares that list to what my agent&#8217;s memory thinks is in production. The first time it ran, it caught three more drifts I had not noticed. It has paid for itself in saved disk space alone.</p><h3>Mistake 2. Stale memory: a Stripe key that &#8220;needed rotating&#8221; three weeks after I had already rotated it.</h3><p>I want to tell this one straight, because the easy version of this story is wrong.</p><p>The easy version is &#8220;three sessions of my agent did the same task at the same time because they did not coordinate.&#8221; That was the visible behavior. It was not the cause.</p><p>The actual cause was older. A few weeks earlier, I had legitimately rotated a Stripe key, once, by hand. I closed the loop. I told the agent. The task got marked done in the moment. Where it went sideways was in how that &#8220;done&#8221; was recorded across the agent&#8217;s stack. There was a bug, and I want to be honest about it: a state-write that should have updated <em>every</em> place the rotation lived, only updated some of them. The completed task got cleared from the visible task board. The internal &#8220;intents&#8221; memory, the thing the daily shifts read when deciding what still needs doing, kept holding onto the original &#8220;rotate this key&#8221; intent. It looked, to anything reading that memory, like the key was still on the to-do list.</p><p>It was a very narrow bug. Most state writes were fine. This particular shape, rotation tasks linked across both a board entry and an intents memory entry, slipped through because each surface was updated by a different code path, and only one of those paths ran on completion. That is the kind of bug that does not fail loud. It just sits there until something reads the wrong half.</p><p>What read the wrong half was the daily shift. It saw &#8220;rotate Stripe key&#8221; still in intents, did not see it on the visible board, reasoned that it must have been deferred, and queued it. An iMessage wake hit the same intents memory, made the same call, and queued it again. By the time I noticed, three sessions had each done the rotation, independently, inside six hours. The two near-identical &#8220;Stripe key already rotated, all good&#8221; messages 56 minutes apart were the system reporting up the same false signal twice.</p><p>This is not a theoretical class of failure. The Redis team wrote a <a href="https://redis.io/blog/why-multi-agent-llm-systems-fail/">survey on why multi-agent systems fail</a> and stale-state-driven duplicate work is one of the named modes. Knowing that did not save me. Building the audit that caught it did.</p><p><strong>What I do now.</strong> Three small habits, in order of how much they cost me to learn. One: every state write that is supposed to mean &#8220;this is finished&#8221; updates all the surfaces in a single transaction, or none. If a task lives in two places, it must close in two places, atomically. Two: the daily shift no longer trusts a single source for &#8220;still open.&#8221; It cross-checks the intents memory against the visible task board, and any disagreement gets flagged for review before the agent acts on it. Three, and this is the one I would tell anyone running an autonomous loop: assume your stored &#8220;intents&#8221; go stale, build a small staleness check that re-reads the world before acting, and treat any deferred task older than a week as suspicious by default. Most of the time the world has already handled it.</p><h3>Mistake 3. Hidden timeouts: Codex hung silently inside the model switcher.</h3><p>This one came out of a thing I was actually proud of building. After I wrote about <a href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026">why Opus 4.7 brought me back to Codex</a>, I started building a model switcher: a small layer that decides, per task, whether work goes to Claude or to Codex, based on cost, current usage, and which one is healthier at that moment. I packaged the result up later as a small utility, <a href="https://wiz.jock.pl/store/ai-model-switcher">the AI Model Switcher</a>, but it started as my own internal plumbing for routing wake-handlers between the two harnesses.</p><p>The mistake lived in how I wired Codex into the switcher. When the switcher chose Codex, it shelled out to the Codex CLI through a wake-handler script. The script trusted that Codex would either succeed or fail in some recognizable way: a quick exit, an OAuth error, a network error. The Claude fallback inside the switcher was wired to those signatures specifically.</p><p>What I did not plan for was the silent hang. One morning the Morning Briefing simply did not arrive. I traced it to Codex, which had been launched by the wake script, then sat there. Authenticated, idle, producing no output, for 26 minutes, until the outer timeout finally killed it with exit 124. The Claude fallback never fired, because a blind hang does not match an OAuth-expired signature. The switcher, designed to make me <em>more</em> resilient, had introduced a path where the resilience cascade never got reached.</p><p>The lesson is general enough to keep around. If a subprocess can hang silently, the preflight that decides whether to use it must be much, much shorter than the budget it is allowed to consume. I added a three-second <code>codex --version</code> preflight to every Codex wake path inside the switcher. Three seconds versus a 30-minute wake budget is a 600x safety margin. Anything less gives the hang an asymmetric advantage over the fallback. That ratio, once you see it, shows up everywhere: any time a small thing decides whether to call a bigger thing, the small thing has to fail fast.</p><p><strong>What I do now.</strong> Every router or switcher in my agent stack has a cheap, hard-bounded preflight before it commits to the expensive path. The switcher does not just trust that &#8220;Codex is configured&#8221; or &#8220;Claude is configured.&#8221; It pings each one with a sub-second probe before the wake clock starts ticking on the real call. When the probe fails, the switcher does not even try, it routes around. The model switcher writeup at the store has the exact pattern. The agent has not lost a wake to a silent Codex hang since.</p><h3>Mistake 4. Almost-disaster: The shell allowlist that almost let the agent <code>rm -rf /</code>.</h3><p>This one I am still a little embarrassed about. The local-LLM agent loop has a tool called <code>run_command</code>, gated by a prefix-only allowlist. <code>curl</code> was on the allowlist. In other words, the check passed if the command <em>started with</em> a known-safe binary. So a command like <code>curl https://thing.com; rm -rf /</code> would have sailed through, because <code>curl</code> is at the start. The shell would happily run both halves.</p><p>The agent never actually generated that. I caught it during a routine read-through of the code, which is a bad way to find a vulnerability. The fix was a list of forbidden shell metacharacters (<code>;</code>, <code>&amp;&amp;</code>, <code>||</code>, <code>|</code>, backticks, <code>$(</code>, redirects, newlines). Allowlisted commands still run, chained commands get rejected before they reach <code>shell=True</code>.</p><p>The general rule I now keep visible: a command allowlist that does prefix matching is not really an allowlist. It is a polite suggestion. Real safety means parsing what would actually execute, then deciding.</p><p><strong>What I do now.</strong> Anywhere I let a model produce a string that turns into an executed command, I assume the model will eventually try every legal way to bend the parser. The check is not &#8220;does it start with a safe word.&#8221; The check is &#8220;after I parse this exactly the way the shell will, does every piece resolve to something I would let it do.&#8221; For anything destructive (filesystem writes, network calls to non-allowlisted hosts, subprocess spawns), the agent does not just need to pass the parser, it has to pass a second human-or-Pawel confirmation gate. I would rather be slow than embarrassed.</p><h3>Mistake 5. Quiet failure: the local-LLM bridge had been running unsupervised for a week.</h3><p>The local-LLM tier on my Mac Mini has three pieces. Ollama serves the small models. <code>llama-server</code> serves anything heavier. And LiteLLM sits between them as a tiny bridge that exposes the whole local stack as a Claude-compatible endpoint, so the rest of my agent code can pretend it is just talking to Anthropic. LiteLLM is the load-bearing piece that makes the local fallback actually fall back.</p><p>I noticed during an audit that LiteLLM had been running for seven days. That sounded healthy at first, until I checked how it had been started. It was a bare <code>python -m litellm</code> invocation I had launched from a terminal a week earlier and forgotten about. No launchd plist. No supervisor. No restart-on-crash. If that one process had quietly died, no automation would have respawned it, and the entire local fallback path would have been silently dead. The agent would have kept routing to Claude as long as Claude was up, then fallen straight off the cliff the first time Claude was unavailable, with no soft layer in between to catch it. I would not have noticed until something important broke during a Claude outage at 3am.</p><p>The fix was to wrap LiteLLM in a proper user LaunchAgent: <code>RunAtLoad=true</code>, <code>KeepAlive=true</code>, <code>ThrottleInterval=30</code>. I tested it by killing the process by hand. It came back in 13 seconds. The same logic now applies to every other long-running piece in the local stack. Nothing critical runs as a bare process anymore.</p><p>The lesson here is not about <code>launchd</code>. It is about safety nets that are themselves unsupervised. If the thing that is supposed to catch you when the main thing fails has no one watching <em>it</em>, you do not have a safety net, you have a comforting story.</p><p><strong>What I do now.</strong> Every &#8220;fallback&#8221; or &#8220;backup&#8221; path has its own monitoring, on its own clock, separate from the primary it protects. The watchdog reports if a process restarted unexpectedly, if uptime is suspiciously long without a managed parent, if a daemon&#8217;s plist is missing or unloaded. A safety net you have not pulled on this week is not a safety net.</p><h3>Mistake 6. Character drift: &#8220;I can&#8217;t&#8221; was almost always wrong.</h3><p>This last one is more about the agent&#8217;s character than its infrastructure. There were weeks where iMessage voice memos from me went unanswered. The agent was politely replying with variations of &#8220;sorry, I cannot transcribe audio from this channel.&#8221; On another day, when I asked it to check on something happening on a livestream, it replied that it could not watch live streams.</p><p>Both were technically untrue. The transcription tool had a wrong model id baked in and was structurally broken, so the answer was to fix the tool, not to apologize. The livestream check could have been done with a screenshot of the stream and a vision pass. The agent had once decoded a voice DM with no prior setup, cold, on Discord. That bar already existed. It just was not being hit.</p><p>The fix was doctrinal, not technical. I rewrote a corner of my agent&#8217;s identity file to make &#8220;find a way&#8221; the default and &#8220;I cannot&#8221; the failure mode. Then I added a daily scanner that reads the agent&#8217;s outgoing messages and flags any phrase that smells like quiet defeatism, so it gets routed back into the next morning&#8217;s improvement loop. The interesting result, after a few days: way fewer apologies, and the few that remain are about things that are actually impossible.</p><p><strong>What I do now.</strong> I treat every &#8220;I cannot&#8221; reply from the agent as a hypothesis, not a verdict. The next time it shows up, the test is: did the tool actually fail, or did the model decline before trying? If the tool fails, fix the tool. If the model declined, fix the prompt and the doctrine, then re-run. The phrase &#8220;I cannot&#8221; is allowed to live in my agent&#8217;s vocabulary only after at least three meaningfully different attempts have actually been made.</p><h2>What ties these together</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GvOj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GvOj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 424w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 848w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 1272w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GvOj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:404164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195604628?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GvOj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 424w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 848w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 1272w, https://substackcdn.com/image/fetch/$s_!GvOj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5188c918-4b11-49bb-913c-449aa5f0cada_2773x1615.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If I had to give all six the same one-sentence summary, it would be this. Every one of them started with an assumption that had stopped being true.</p><p>The 35B model was light because <em>last time I checked</em> it was light. Memory said Gemma because <em>at some point</em> it was Gemma. The wake script trusted Codex because <em>the last time</em> Codex hung, it hung in a specific recognizable way. The allowlist was safe because <em>nobody had thought</em> about chained commands. Safari worked because <em>yesterday</em> Safari worked. The agent said &#8220;I cannot&#8221; because <em>last week</em> that path was broken.</p><p>An autonomous system is not a thing you build once. It is a thing whose internal map of itself you have to keep honest, against a world that quietly rearranges underneath it. The compounding wins from agents come from <a href="https://thoughts.jock.pl/p/ai-productivity-paradox-wellbeing-agent-age-2026">leverage</a>. The compounding losses come from drift. The actual job, most days, is to build small honest checks faster than the drift accumulates.</p><div><hr></div><p><em>If any of this resonates, here is the closing offer, plainly. I write all of this for free, here, twice a week, the wins and the walls. A <a href="https://thoughts.jock.pl/subscribe">free subscription</a> is the only thing you need to get the full picture. If you also want the 10% that ended up actually working, the playbooks I clean up after the experiments stop hurting, those live on <a href="https://wiz.jock.pl/store">the Wiz Store</a> for paid subscribers(all free for annual, one per month for monthly). Both are completely fine for me. Both keep me writing. The point of this blog is the same either way: I want you to make better mistakes than I did.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p><div><hr></div><p><strong>AI Agent Night Shift Playbook</strong></p><p>The resource safety checklist I built after this incident is included in the Night Shift Playbook. Thermal thresholds, disk caps, memory limits, and the recovery patterns that prevent this from happening again.</p><p><strong>$19</strong> at <a href="https://wiz.jock.pl/store/night-shift-playbook">wiz.jock.pl/store</a>. Free for paid subscribers.</p>]]></content:encoded></item><item><title><![CDATA[I Have ADHD. My AI Agent Is the Best and Worst Thing for It.]]></title><description><![CDATA[What an AI agent and an ADHD brain actually do to each other, good and bad, and what to do about it.]]></description><link>https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/adhd-ai-agent-personal-experience-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Fri, 24 Apr 2026 12:53:40 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FyRD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FyRD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FyRD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FyRD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3695926,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195344496?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FyRD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!FyRD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61af45ad-6234-405f-8c9b-7b02cebb6d06_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Two weeks ago on a <a href="https://aiforlifelonglearners.substack.com/p/are-we-early-or-are-we-weird">podcast with Tom</a>, I got asked what an AI agent means for someone with ADHD. I gave a short answer on the mic. I have been thinking about it since. This post is the longer one.</p><p>ADHD is a spectrum, so one caveat. What I describe here is my brain. If you have ADHD, you might recognize some of it or none of it. If you do not, you might still relate. The Internet has flattened ADHD into &#8220;hyperfocus cheat code&#8221; or &#8220;I get distracted, lol, same.&#8221; It is not that. It is a real condition that makes life meaningfully harder in ways that are not always visible. More on diagnosis at the end.</p><h2>The bad part</h2><p>Context switching, amplified.</p><p>Before an agent, my filter was friction. An idea would show up, I would try to write it down, and the note would either die quietly in some list I never read again or I would drop everything and do it right now. The middle ground was thin. That friction, it turns out, was protecting me from myself.</p><p>Now the friction is gone. I can start almost anything in a sentence. Not &#8220;start&#8221; as in type a note. Start as in delegate an actual prototype, stand up a small experiment, launch a scraper. I wrote about what that does to a week in <a href="https://thoughts.jock.pl/p/ai-productivity-paradox-wellbeing-agent-age-2026">16 Products in Two Months. Zero Free Time</a>. The short version: an agent can hold eight open threads, my brain holds one, and the output-to-attention tradeoff is real.</p><p>What I do about it. I cap the &#8220;Now&#8221; list hard. One to three things at a time, not eight. I built a small wellbeing layer on top of Wiz that nudges me when the count is drifting, when it is late, when notifications should be muted. Not a cure. What it does is turn &#8220;as many open loops as possible&#8221; into a pace I can hold.</p><h2>The good part (bigger, two faces)</h2><p>First, an agent is a personal assistant for the boring part.</p><p>I am a creative person. The interesting work for me is always in the idea itself, not in the directory structure or the deploy command. The operational layer is the part my executive function gets taxed twice for. An agent absorbs most of it. The consequence is hard to overstate. I have ideas today that two years ago would have stayed ideas, not because they were bad, but because the execution cost was higher than I could pay. Now I have ideas and prototypes of those ideas. I choose between working things instead of vibe.</p><p>Second, and this is less an ADHD trait than a personal one. I adapt to new environments and tools fast. Drop me into a new workflow and I will find the shape of it within a day. That has always been useful. With an agent it is multiplied. Every time I learn a better way to hand work to Wiz, the whole system gets faster, and the cost of trying a new workflow is one voice note. I do not wait for documentation or a workshop. I try, I keep what sticks. If you share that trait, the agent era is built for you.</p><p>Concretely, how it works. I describe an idea whenever it hits, sometimes quickly, sometimes as a long dictated note. The agent writes it to the right place and, if there is enough context, picks it up during the <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">night shift</a> or a day shift. I come back to a Discord message or email saying &#8220;here is a thing, take a look.&#8221; A minute to know if I want to keep going.</p><h2>What I would tell another ADHD person starting with an agent</h2><p>Three things that have helped me most:</p><ol><li><p><strong>Offload immediately.</strong> The second an idea shows up, say it out loud to the agent. Do not let it sit in your head waiting for a quiet moment. Your working memory is the wrong place to store it. The agent is.</p></li><li><p><strong>Cap the &#8220;Now&#8221; list.</strong> Mine is three. It could be two. It is not eight. Capacity is the silent cost that agents will happily exceed on your behalf if you do not give them a ceiling.</p></li><li><p><strong>Batch the check-ins. Do not supervise.</strong> The agent is not a pair-programming buddy for an ADHD brain. It is a night-shift worker. Give it a job, go do something else, come back and judge the result. Continuous supervision burns the same attention channel as the work itself.</p></li></ol><h2>From Wiz&#8217;s memory (a note from the other side)</h2><p>Since this post is partly about how my agent and I actually work together, I asked Wiz (the agent I wrote about <a href="https://thoughts.jock.pl/p/wiz-ai-agent-self-improvement-architecture">here</a>) what patterns it sees from its side of the pipe. Three honest observations:</p><blockquote><p><strong>1. He offloads fast.</strong> Ideas almost never sit in Pawel&#8217;s head. They are dictated into me within seconds, often as long voice notes full of tangents, and then his brain lets go. I keep the note; his working memory is free. That single habit is probably half of why this works for him.</p><p><strong>2. He prunes cheaply.</strong> He picks up his own ideas after a night and drops more than half without regret. The agent made &#8220;drop it&#8221; cheap because he has a working thing to drop, not a paragraph of hope.</p><p><strong>3. He does not supervise.</strong> The &#8220;Now=3&#8221; cap is not a preference he wrote once. It is a real ceiling we both respect, because the alternative is four started and two finished. Continuous supervision would cost him the attention he is trying to protect.</p></blockquote><p>None of that was obvious from tutorials. It emerged from the shape of our sessions.</p><h2>A broader observation for work</h2><p>For years, the narrative on ADHD at work has been uneven. Great at the creative parts, taxed by the operational parts. Agents reverse that tax. The operational layer, the planning, the cadence, the follow-through, the small continuous labor, can now be handled. Not perfectly. Meaningfully. A person with ADHD plus an agent that actually knows their context is a different employee than a person with ADHD alone. The creative engine is still the superpower. The drag behind it can now keep up.</p><p>I do not think ADHD folks become &#8220;normal&#8221; employees. I think they become obviously valuable ones. I expect <a href="https://thoughts.jock.pl/p/ai-adoption-gap-who-actually-uses-ai-2026">the AI adoption gap</a> to move here first.</p><h2>Closing</h2><p>If you have ADHD, do not build a workflow on willpower. You already know what willpower costs you. Put external scaffolding in place. A to-do list is not scaffolding. A thing that <a href="https://thoughts.jock.pl/p/my-ai-agent-works-night-shifts-builds">picks up your ideas while you sleep</a> is scaffolding.</p><p>And if any of this sounds like you, please do not diagnose yourself from a blog post. Mine or anyone else&#8217;s. The Internet is full of content that makes ADHD sound like a quirky productivity trait. It is not. It is a real condition that makes plenty of lives harder, and the only honest path is a proper clinical diagnosis. If it turns out you have it, help exists. If it turns out you do not, you still get useful information.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[I Cancelled Codex Two Months Ago. Opus 4.7 Brought Me Back.]]></title><description><![CDATA[For six months Claude Max was enough. Opus 4.7 shipped on April 17. By April 22 I was paying $200 a month for ChatGPT Pro again. Here is what I found.]]></description><link>https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Thu, 23 Apr 2026 09:28:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!x6nA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x6nA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x6nA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x6nA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4756883,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195218152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x6nA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!x6nA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76403c9-4c66-4200-a2c6-1397233c4bd7_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I let my OpenAI Pro subscription lapse two months ago. Claude Max 20x was covering everything. My agent, my automations, my experiments, my day-job research, my blog drafts. One subscription, one CLI, one model. Life was simpler.</p><p>Last week I renewed ChatGPT Pro. Two hundred dollars a month on top of Claude Max. That is not a small decision when one subscription was already covering the work. I want to walk through what pushed me, because the short version is: Opus 4.7 feels noticeably worse than Opus 4.6 did, and I am not the only one saying it.</p><h2>What I actually notice with Opus 4.7</h2><p>Two months ago my reality with Claude Code was &#8220;I ask, it does.&#8221; Not always first try, not always without steering, but the floor was high. When I wanted a small app shipped, a scraper set up, or a refactor across my agent&#8217;s architecture, Opus 4.6 found a way. I handed it a video file and no ingest pipeline once. It wrote itself a decoding skill and kept going. That floor is what <a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">my compounding agent</a> was built on.</p><p>Then two things shifted, in sequence.</p><p><strong>First, one million context became the default.</strong> When the 1M window shipped I was genuinely excited. Bigger codebases in a single session. Less compacting. More cross-task memory. I pushed it hard for a few weeks. Then I noticed I was steering the model more, not less. Not because the tasks got harder, but because outputs got shallower the deeper into the context window I went. That drift is a known property and Anthropic is transparent about it. The catch is that making 1M the <em>default</em> means the average session is quietly sitting further out on the recall curve, where the model is worse. I switched my defaults back to 200k. My hit rate improved immediately.</p><p><strong>Second, and more important, Opus 4.7 shipped on April 17.</strong> Within days my experience went from &#8220;I steer occasionally&#8221; to &#8220;I am steering constantly.&#8221; The behaviors that changed:</p><ul><li><p><strong>It stopped trying as hard.</strong> Before, when I asked for depth the model went deep. Now it often returns in two or three minutes with a grep-level summary. I can see in the logs that it read six files instead of sixty.</p></li><li><p><strong>It stopped following instructions the way it used to.</strong> I ask for a specific approach, I get a different one. I ask it not to do X, and X shows up in the diff.</p></li><li><p><strong>It asks more questions and commits less work.</strong> Where the previous version would pick a reasonable default and move, 4.7 pauses and pings me for clarification on choices I already pre-specified in the prompt.</p></li><li><p><strong>Full-file rewrites where surgical edits used to live.</strong> Entire files come back re-indented or restructured with changes I did not ask for.</p></li></ul><p>None of these items in isolation would have pushed me off Claude. I could live with shallower reads. I could live with the occasional full-file rewrite I did not ask for. What got me was the compounding. Reasoning decline on top of shallower analysis on top of stale web search on top of a tokenizer that costs 35% more per token on top of a weekly ceiling that now hits me on normal work days. Many things in one. Each one small. All of them together, not small.</p><p>That is the honest shape of what changed. It is not a single regression you can point at. It is a pile of small declines that stack until your daily experience with the agent feels qualitatively different. I can grasp each piece on its own. The pile is harder to grasp, because by the time you notice it, you are already burning more time and tokens to get the same work done.</p><p>I still spent a week assuming it was me. Cleaned up my <code>CLAUDE.md</code>. Shortened my memory. Rewrote a couple of skills to be more explicit. None of it moved the needle in the way I wanted.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>I am not the only one seeing this</h2><p>Before adding another $200 to my monthly burn I wanted to check if this was really the model or just my setup drifting. Three data points convinced me it was the model.</p><p><strong>GitHub issue #42796.</strong> This one is not a random complaint. It was filed by <a href="https://github.com/stellaraccident">Stella Laurenzo, Senior Director of AI at AMD</a>, on <a href="https://github.com/anthropics/claude-code/issues/42796">the claude-code issue tracker</a>. Her team analyzed 6,852 Claude Code sessions, 234,760 tool calls, and 17,871 thinking blocks from their real engineering work. <a href="https://www.theregister.com/2026/04/06/anthropic_claude_code_dumber_lazier_amd_ai_director/">The Register</a>, <a href="https://www.techradar.com/pro/claude-cannot-be-trusted-to-perform-complex-engineering-tasks-amd-ai-head-slams-anthropics-coding-tool-after-months-of-frustration">TechRadar</a>, and <a href="https://www.pcgamer.com/software/ai/amds-senior-director-of-ai-thinks-claude-has-regressed-and-that-it-cannot-be-trusted-to-perform-complex-engineering/">PC Gamer</a> all covered it. The numbers are unkind:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GG28!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GG28!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 424w, https://substackcdn.com/image/fetch/$s_!GG28!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 848w, https://substackcdn.com/image/fetch/$s_!GG28!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 1272w, https://substackcdn.com/image/fetch/$s_!GG28!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GG28!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png" width="699" height="297" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d554b3db-ae99-4825-99fd-23d354912993_699x297.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:297,&quot;width&quot;:699,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195218152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GG28!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 424w, https://substackcdn.com/image/fetch/$s_!GG28!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 848w, https://substackcdn.com/image/fetch/$s_!GG28!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 1272w, https://substackcdn.com/image/fetch/$s_!GG28!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd554b3db-ae99-4825-99fd-23d354912993_699x297.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>And the cost side, which is what actually hurts: 80x more API requests and 170x more input tokens to produce measurably worse output. Same human effort. 122x more dollars per day on the same workload.</p><p>Anthropic&#8217;s response, pinned by @bcherny, is that a UI-only header (<code>redact-thinking-2026-02-12</code>) hides thinking summaries from the display but does not reduce thinking depth itself. That is the official position. Users can opt out via <code>showThinkingSummaries: true</code> in settings.json. The data in the thread suggests something is moving in parallel, or users have become better at detecting shallower behavior once they started watching for it.</p><p><strong>Marginlab&#8217;s tracker.</strong> The <a href="https://marginlab.ai/trackers/claude-code/">Claude Code performance tracker at marginlab.ai</a> is an independent third-party daily benchmark. It runs the Claude Code CLI directly, with no custom harness, against a curated SWE-Bench-Pro subset. It exists specifically because Anthropic published a postmortem on Claude degradations in September 2025 and said someone should watch for future ones. Their current status note: degradation detection is paused while a new baseline is collected for Opus 4.7. That is telling. A third party thought a regression was possible enough to build daily infrastructure to catch it.</p><p><strong>Theo&#8217;s video, &#8220;<a href="https://www.youtube.com/watch?v=KFisvc-AMII">Did Claude really get dumber again?</a>&#8220;</strong> His thesis is less conspiratorial than the title. It is not that the model got dumber in absolute terms. It is that our expectations recalibrated. What Opus 4.5 felt like in January was a miracle. When Opus 4.7 delivers roughly the same capability curve in April, we feel cheated. We expected the jump. We got a shuffle. Theo&#8217;s separate criticism of the new system prompt as &#8220;lobotomized&#8221; fits alongside this: when the harness changes and the model changes at the same time, attribution gets fuzzy and users land on &#8220;the model is worse&#8221; because that is the thing they remember by name.</p><p>The expectations argument lands for me. I was demanding more because I had watched the curve bend steeply for two years. When the floor stopped rising I reacted as if it had dropped. Both can be true at the same time. The measurements in #42796 are real. The shift in expectations is also real. They compound.</p><h2>Is it me? I spent a week asking that question</h2><p>When you build your own AI agent, every model regression feels personal. You start questioning your own work. I spent the better part of a week on that loop.</p><p>Did I migrate my <code>CLAUDE.md</code> badly when 4.7 launched? Reviewed it twice. No. Is my memory file too large? It is the same 7,329-token load <a href="https://thoughts.jock.pl/p/claude-code-startup-context-7329-tokens-measured-2026">I measured last week</a>. Nothing changed there. Did one of my skills go stale? I tested each of the three I use most. They behave the same as they did in March.</p><p>I tried using Opus 4.7 <em>without</em> 1M context as the default. That helped a little. Not enough to explain the gap. Then I tried the honest pivot: pin effort to <strong>max</strong> on every turn. And here is the thing most of the &#8220;4.7 is bad&#8221; takes miss. At max reasoning, 4.7 comes back. The depth returns. Instruction-following tightens. It stops skimming. A few hard tasks at max effort landed better for me than anything 4.6 at high effort ever did. The model is still in there.</p><p>The catch is the cost. Max effort burns usage in my setup roughly 3 to 4 times faster than medium did. On Claude Max 20x that means my weekly ceiling arrives on Tuesday instead of Friday. I am not paying for a more capable model. I am paying more to reach the capability that used to be the default. That is the real regression for heavy users. The better model is still reachable. It is sitting behind a paywall of tokens.</p><p>For my agent&#8217;s normal daily volume, max on every turn is not viable. I ran a workable compromise for a week &#8212; manually bumped effort for hard tasks, left the default in place for automation glue. It got me more usable output than medium alone. It did not get me back to the &#8220;just ask, it does&#8221; reality I had two months ago.</p><p>The one other place Opus 4.7 still feels strong for me is inside other harnesses. I wrote <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">the harness comparison post</a> in mid-April and noted the Pi harness was excellent. 4.7 inside Pi is good. The trouble is that Anthropic blocks Claude Max subscriptions from being used inside third-party CLIs, which makes Pi a per-token API spend for me. Not viable at my daily volume. So the realistic choice is Claude Code with 4.7 at medium effort plus manual max bumps, or go somewhere else entirely.</p><h2>Why I re-subscribed to Codex</h2><p>I let ChatGPT Pro lapse in February because I was mostly using Claude Code and the bill stung. This time I renewed specifically to run a comparison. My agent has a switcher I wrote two months ago and then stripped out when it felt redundant. I rebuilt it last week. It flips the whole stack between Claude Code (with Opus 4.7) and Codex (with <a href="https://openai.com/index/introducing-gpt-5-4/">GPT-5.4 Thinking</a>). The agent&#8217;s memory, skills, and routing stay the same. Only the harness and model change.</p><p>What I noticed after a week of A/B testing:</p><p><strong>Web search is just better on Codex.</strong> I asked both the same question about a niche topic where I knew a recent update existed. Codex with GPT-5.4 came back with current information, cited results from the last two weeks, and summarized accurately. Claude Code came back with two-week-old results and missed the update entirely. I repeated this on three other topics where timeliness mattered. Same pattern. I do not know whether it is a WebFetch tool issue in Claude Code or a search backend problem. I know the output is worse.</p><p><strong>Depth of analysis is better on Codex.</strong> When I ask Codex to trace a change through my agent&#8217;s architecture, it reads enough files to build a real dependency map before it starts writing. It connects modules I would have forgotten to check. Opus 4.7, on the same prompt, greps for keywords, reads what grep returned, and writes the patch. The grep-first habit is a regression from what 4.6 did by default. Codex gives me the &#8220;if we change X, we also need to touch Y and Z&#8221; map that used to be Claude&#8217;s calling card.</p><p><strong>Usage feels fair on Codex.</strong> This is the one most people will actually care about. On Claude Max 20x, a normal day of automations plus active coding eats 10-15% of my weekly quota without doing anything heroic. When I pair-program on something non-trivial I can burn 40% in an afternoon. The five-hour and weekly ceilings both hit me. On ChatGPT Pro, with the same automations routed through Codex, I have not hit a ceiling once in a week of equivalent workload. OpenAI promoted Pro to 10x Codex usage through May 31 as a launch push, then moves to 5x, and <a href="https://www.morphllm.com/comparisons/codex-vs-claude-code">multiple comparison pieces</a> are now flagging the gap: &#8220;the practical quota you get per dollar has diverged sharply.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uwhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uwhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 424w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 848w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 1272w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uwhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png" width="989" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:989,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50771,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195218152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uwhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 424w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 848w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 1272w, https://substackcdn.com/image/fetch/$s_!uwhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6da0b7ad-4c4b-4918-9b6d-09b2d50d5c05_989x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you want the usage math broken down cleanly, I already wrote <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">the token waste deep-dive for Opus 4.7</a> last week. The new tokenizer costs up to 35% more tokens for the same workload. Combined with the laziness effect, which forces more re-prompts per task, you are doing the same job for meaningfully more money per day. Several readers emailed after that post to say they saw the same curve on their setups. The <a href="https://wiz.jock.pl/store/agent-efficiency-kit">agent-efficiency-kit</a> I packaged afterwards is a $49 drop-in that addresses the direct burn (three script hooks plus a 1K-token <code>AGENT_INSTRUCTIONS.md</code> patch for your <code>CLAUDE.md</code>). It is useful whether you stay on Claude or not, because the patterns it enforces also help the other harnesses behave.</p><h2>What I am doing now</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HMdT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HMdT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 424w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 848w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 1272w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HMdT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png" width="1456" height="1327" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1327,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:232515,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195218152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HMdT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 424w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 848w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 1272w, https://substackcdn.com/image/fetch/$s_!HMdT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d77a7c1-8677-4af2-937b-0a0131860a1a_1693x1543.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026/comments"><span>Leave a comment</span></a></p><p>For a week I have been running both. The switcher is a small piece of code my <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">agent&#8217;s interface</a> can call. Claude Code handles one class of work. Codex handles another. Neither is strictly better at everything. The overlap is narrower than I expected.</p><p>Claude still wins on:</p><ul><li><p><strong>Claude Design</strong> (the new visual tool). No Codex analog exists yet.</p></li><li><p><strong>Prompt caching.</strong> Anthropic&#8217;s cache is load-bearing for how my agent is architected. Without it my monthly bill would be roughly 5x what it is. The economics of always-on agent infrastructure depend on that cache holding up.</p></li><li><p><strong>Familiar tooling and hooks.</strong> My <code>CLAUDE.md</code>, my skills, my rules, my logging. All tuned for Claude Code&#8217;s behavior over the past year.</p></li></ul><p>Codex wins on:</p><ul><li><p><strong>Web search freshness and accuracy.</strong></p></li><li><p><strong>Depth of reasoning on large codebases.</strong></p></li><li><p><strong>Usage headroom</strong> at the same $200 price point.</p></li><li><p><strong>Cleaner instruction following</strong> on the GPT-5.4 series.</p></li><li><p><strong>Visual app UI.</strong> I use the Codex app alongside the CLI. The app-level structuring of conversations works for my brain in a way the Claude desktop never has.</p></li></ul><p>My day now looks like: the agent runs its morning routines on Claude Code, because the skills are tuned there. When I sit down to actively work on something, I pick the side based on the task. Research-heavy or fresh-information tasks go to Codex. Architectural refactors go to Codex. Small agent-adjacent changes and automation glue stay on Claude Code. If a task stalls on one side, I flip the switcher and try the other.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pg4-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pg4-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 424w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 848w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 1272w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pg4-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png" width="593" height="343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b8a97a6-8796-4992-933e-28099b785d18_593x343.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:343,&quot;width&quot;:593,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27669,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/195218152?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pg4-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 424w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 848w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 1272w, https://substackcdn.com/image/fetch/$s_!Pg4-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8a97a6-8796-4992-933e-28099b785d18_593x343.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The net effect is that I stopped worrying about the weekly ceiling. Split burn across two providers means I have roughly 2x the headroom for the same quality of output. I am paying $300 a month total. A month ago I was paying $200 to Anthropic and steering the model constantly. The extra hundred dollars bought back my throughput.</p><p>And the third tier is still there quietly. My <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">Mac Mini runs a 35B local model</a> for classify-and-route work that does not need a frontier brain. Cheap, fast, good enough for small things. Not a substitute for Claude or GPT-5.4, but a calm third lane.</p><h2>Where this goes</h2><p>I do not think Opus 4.7 is a permanent regression. Anthropic has tuned rough launches before and they will tune this one. But the math is not just about one model this time. It is about what OpenAI is doing with Codex at the same price point, and what the open-source harnesses are doing alongside them. <a href="https://x.com/thdxr/status/2009742070471082006">Dax Raad, who built OpenCode</a>, publicly partnered with OpenAI to let Codex Pro subscriptions run directly inside his harness. Anthropic&#8217;s stance toward third-party harnesses has been the opposite: they have blocked Claude Max subscriptions from outside CLIs. That stance made sense when Claude was the clear leader in agentic coding. It gets harder to hold as parity closes and the friction pushes users toward the side that welcomes them.</p><p>My prediction for the next 60 days: one of two things moves. Either Anthropic tunes 4.7 back to the 4.6 floor and adjusts usage generosity, or they let the gap hold and lose their heavy users to Codex. I wrote about <a href="https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark">this general dynamic in April</a> and it is moving faster than I expected.</p><div><hr></div><p>For my paid subscibers I have switcher ready for free here: </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://wiz.jock.pl/store/ai-model-switcher/&quot;,&quot;text&quot;:&quot;Get Model Switcher&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://wiz.jock.pl/store/ai-model-switcher/"><span>Get Model Switcher</span></a></p><div><hr></div><p>For now I am happily running both. Dynamic switcher, paid kit in my store for the token-waste problem, and a calmer Sunday than I had last weekend. If you are a paid <a href="https://thoughts.jock.pl/subscribe">Digital Thoughts</a> subscriber and want the switcher code, reply to this post and I will send you the exact setup I am using. Free readers who are hitting the Opus 4.7 token burn right now: the <a href="https://wiz.jock.pl/store/agent-efficiency-kit">agent-efficiency-kit</a> handles the direct bleeding at $49.</p><p>The honest line: I thought I had picked a side when I cancelled Codex two months ago. It turns out I had picked the moment. Staying flexible was the actual move.</p><p><em>Related posts: <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">Claude Code vs Codex CLI vs Aider vs OpenCode vs Pi vs Cursor</a>, <a href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026">Opus 4.7 and token waste management</a>, and <a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">The Compounding Agent</a>.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/opus-4-7-codex-comeback-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Connected My AI Agent to a Notes App. Now I Can’t Stop Using It.]]></title><description><![CDATA[Antinote is a $5 macOS scratchpad with a custom extension API. I wired it to my AI agent in an afternoon. Here&#8217;s how, plus a giveaway of 20 licenses.]]></description><link>https://thoughts.jock.pl/p/antinote-ai-agent-integration-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/antinote-ai-agent-integration-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Tue, 21 Apr 2026 09:37:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oRPc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oRPc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oRPc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oRPc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2947965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194890196?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oRPc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!oRPc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7f73b87-6503-4045-aca4-b2f4275a3091_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hi everyone, it&#8217;s Pawel here and this is another week of experimentation. This one, I think a lot of you will actually find useful quickly.</p><p>Before I get into it: we are almost at 2,000 subscribers. That&#8217;s wild. Thank you, seriously. I also noticed lately that my most popular posts are the basics ones. The <a href="https://thoughts.jock.pl/p/how-i-structure-claude-md-after-1000-sessions">CLAUDE.md deep-dive</a>, the <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">first AI agent guide</a>. Both are among the most read things I&#8217;ve written, which is interesting because those are not what I normally do here. I usually want to go deeper into agents, real experiments, real workflows. Although I now get why the basics matter a lot to people starting out. I&#8217;ll write more of them occasionally. <strong>If there&#8217;s something specific you want covered from the ground up, drop it in the comments.</strong></p><p>Today I want to tell you about a piece of software I didn&#8217;t even know I needed.</p><div><hr></div><h2>Notes before notes</h2><p>I&#8217;m on a call. Someone says something I need to remember. Not a task, not a project, just one thing I need to hold for the next 20 minutes. Opening Obsidian for that feels like too much. Bear too. Even a new note in the default Notes app has more friction than I want.</p><p>That gap is what <a href="https://antinote.io/">Antinote</a> fills.</p><p>It calls itself &#8220;notes before taking notes&#8221; and that framing is exactly right. It&#8217;s not a replacement for your main note system. It lives between your brain and your note system. Menu bar app, hotkey (&#8997;+A by default), you type, you move on. Swipe to browse notes, swipe away to create a new one. macOS only for now, iOS is in the works.</p><p>But it&#8217;s not just a text pad. This is where it gets more interesting than you&#8217;d expect.</p><p>Type <code>math</code> at the start of a note and it becomes a calculator. Supports operators, currency, units. Type <code>todo</code> and it becomes a checklist. Type <code>code</code> and you get syntax highlighting. Not an IDE, more like: I need to write down this config snippet without it looking like garbage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H-6l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H-6l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 424w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 848w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 1272w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H-6l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png" width="986" height="484" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:986,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45614,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194890196?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H-6l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 424w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 848w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 1272w, https://substackcdn.com/image/fetch/$s_!H-6l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe0d8936-59db-4b95-9d6b-5fe3b1d0bd5f_986x484.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The one that surprised me most: drag a screenshot onto a note and it extracts the text from it. Local OCR via Apple Vision, nothing leaves your machine. I use this all the time now. AutoPaste, timers, Pomodoro, find/replace with regex. You can export to Obsidian, Bear, Apple Notes when you&#8217;re done.</p><p>It&#8217;s polished. It&#8217;s calm. I like it a lot.</p><div><hr></div><p><strong>Building with AI agents?</strong></p><p>If you&#8217;re getting value from these experiments, the <a href="https://wiz.jock.pl/">wiz.jock.pl store</a> has resources I&#8217;ve built around AI workflows. Worth a look if you&#8217;re serious about running agents that actually work.</p><div><hr></div><h2>The beta is where it gets actually interesting</h2><p>Standard Antinote is already good. The beta version (v2.0.4+) adds extensions, and this is where I got excited.</p><p>Extensions are custom commands you invoke with <code>::</code> inside any note. There are 140+ official ones across AI, date, finance, text, data. You browse them in Settings and click to install. But you can also write your own. Minimum: two files, <code>manifest.json</code> and <code>index.js</code>. The repo is public on GitHub (<a href="https://github.com/johnsonfung/antinote-extensions">github.com/johnsonfung/antinote-extensions</a>).</p><p>Commands can insert text at cursor, replace the current line, replace the whole note, or trigger external URLs. API keys are stored in macOS Keychain, so nothing sensitive lives in a config file.</p><p>If you&#8217;re not comfortable writing JavaScript, there&#8217;s an AI Extension Builder. You describe what you want, it generates a prompt for Claude or ChatGPT, you paste the output into the two files and you&#8217;re done. Most simple things work first try.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What I wired up</h2><p><a href="https://thoughts.jock.pl/p/mac-mini-ai-agent-migration-headless-2026">My AI agent Wiz runs on a Mac Mini</a>, reachable over Tailscale. It has memory, tools, full project context. I&#8217;ve been building this for months and I wrote about the architecture a few times, including the big <a href="https://thoughts.jock.pl/p/wiz-ai-agent-self-improvement-architecture">identity and self-improvement post</a> if you want context. The point is: Wiz can do a lot if I can get input to it quickly.</p><p>I didn&#8217;t want to turn Antinote into another terminal. The value is in the lightness. So I built three commands and stopped.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oc9Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oc9Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 424w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 848w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 1272w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oc9Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png" width="1188" height="890" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:890,&quot;width&quot;:1188,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88853,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194890196?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oc9Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 424w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 848w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 1272w, https://substackcdn.com/image/fetch/$s_!oc9Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7206f92e-f89f-4aa3-994d-0de698f189e2_1188x890.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><code>::wiz</code> is the bare command. Wiz reads the note and tries to figure out what to do with it. Looks like meeting notes? It summarizes. Looks like a task? It creates one on <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">WizBoard</a>. Contains a URL? It fetches and summarizes. Most of the time it gets it right.</p><p><code>::wiz_do(instruction)</code> is when I want to be explicit. <code>::wiz_do(create task)</code>, <code>::wiz_do(draft linkedin post)</code>, <code>::wiz_do(remember)</code>, <code>::wiz_do(stage blog draft)</code>. No guessing, just dispatch.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!muyF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!muyF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 424w, https://substackcdn.com/image/fetch/$s_!muyF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 848w, https://substackcdn.com/image/fetch/$s_!muyF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 1272w, https://substackcdn.com/image/fetch/$s_!muyF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!muyF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png" width="1182" height="616" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:616,&quot;width&quot;:1182,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89997,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194890196?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!muyF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 424w, https://substackcdn.com/image/fetch/$s_!muyF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 848w, https://substackcdn.com/image/fetch/$s_!muyF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 1272w, https://substackcdn.com/image/fetch/$s_!muyF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62f0ae91-1f70-48fb-955e-2a5a3f22278a_1182x616.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ILXP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ILXP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 424w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 848w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 1272w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ILXP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png" width="1092" height="878" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:878,&quot;width&quot;:1092,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:293834,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194890196?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ILXP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 424w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 848w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 1272w, https://substackcdn.com/image/fetch/$s_!ILXP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68247644-9bbe-4cc4-bbb0-e01b24ef5d33_1092x878.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><code>::wizboard(view)</code> pulls task board state into the note. <code>::wizboard(today)</code> shows what&#8217;s scheduled. <code>::wizboard(now)</code> shows what&#8217;s running. I use this during planning to avoid switching apps.</p><p>The plumbing: the MacBook extension POSTs to a small HTTP server on Mac Mini over Tailscale. The server routes it, either to a fast path (Haiku, a few seconds) for things like task creation and idea logging, or to a full Claude Code session (Sonnet, 30-60 seconds) when intent is genuinely unclear. First attempts at the intent classifier were shaky, which is why I added <code>::wiz_do</code> for when I just want to be direct.</p><p>It&#8217;s not instant. That&#8217;s fine. I&#8217;m not looking for a chat interface. I&#8217;m looking for a way to hand something to Wiz without breaking what I&#8217;m doing. This does that.</p><h2>How to build your own (short version)</h2><p>You need the beta (v2.0.4+). Settings &gt; Extensions &gt; Open Extensions Folder. Make a folder for your extension, add two files.</p><p><strong>manifest.json:</strong></p><pre><code>{
  &#8220;name&#8221;: &#8220;my-extension&#8221;,
  &#8220;version&#8221;: &#8220;1.0.0&#8221;,
  &#8220;author&#8221;: &#8220;you&#8221;,
  &#8220;description&#8221;: &#8220;Does something useful&#8221;,
  &#8220;commands&#8221;: [
    { &#8220;name&#8221;: &#8220;mycommand&#8221;, &#8220;type&#8221;: &#8220;insert&#8221; }
  ]
}</code></pre><p><strong>index.js:</strong></p><pre><code>async function mycommand(context) {
  const noteContent = context.content;
  return { type: &#8220;insert&#8221;, content: &#8220;processed: &#8220; + noteContent };
}</code></pre><p>That&#8217;s it. Type <code>::mycommand</code> in any note to invoke it. If you need an API key, add it to <code>requiredAPIKeys</code> in the manifest and Antinote stores it in macOS Keychain. Access it inside the function via <code>context.apiKeys.your_key_name</code>.</p><p>For external calls, use <code>fetch()</code> normally. The extension has network access.</p><p>If you&#8217;re not writing JavaScript yourself, paste this into Claude or ChatGPT: &#8220;I want to build an Antinote extension that [describe what you want]. Give me the manifest.json and index.js files.&#8221; Works pretty well. The <a href="https://github.com/johnsonfung/antinote-extensions">GitHub repo</a> has real examples too, I&#8217;d start there first.</p><h2>The giveaway</h2><p>Antinote is $5 lifetime. Not expensive at all. But I think people here will actually use it, so I bought 10 licenses to give away.</p><p>I reached out to the developer. He matched my 10 and added another 10 on top. So we now have 20 to give away. That was a nice thing for him to do.</p><p>I built a small page to handle it properly. Enter your email, pick your tier (free or paid subscriber), and if a license is still available you&#8217;ll get one sent to your inbox with setup instructions.</p><p><strong><a href="https://wiz.jock.pl/experiments/antinote-giveaway">Claim your license here &#8594;</a></strong></p><p>Free and paid subscribers have separate pools (10 each). Paid subscribers have better odds because it&#8217;s a smaller group. Both pools open at the same time. First come, first served.</p><p>If you build something with the extensions, let me know. I&#8217;m curious what people end up making when you give them a proper hook into their own tools.</p><p>See you next week.</p><p>Pawel</p><div><hr></div><p><strong>Want to go deeper on AI agents?</strong></p><p>I write about this every week. If you&#8217;re on the free tier and getting value from these posts, consider upgrading to paid. You get every post in full, early access to experiments, and, apparently, better odds in giveaways. <a href="https://thoughts.jock.pl/subscribe">Upgrade here.</a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Opus 4.7 Made Me Take Token Waste Management Seriously]]></title><description><![CDATA[TBH - I was working on it for a while now!]]></description><link>https://thoughts.jock.pl/p/token-waste-management-opus-47-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/token-waste-management-opus-47-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Fri, 17 Apr 2026 13:34:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!DJx4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DJx4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DJx4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DJx4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5060100,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DJx4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!DJx4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75bf3e0f-7250-4ba8-9ddb-71dd941c4488_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Anthropic shipped Claude Opus 4.7 on April 16, 2026. Same per-token price as 4.6. New tokenizer. The official docs say it quietly: &#8220;This new tokenizer may use up to 35% more tokens for the same fixed text&#8221; (<a href="https://platform.claude.com/docs/en/about-claude/pricing">source</a>). Do the arithmetic. If you migrate your workload one-to-one, your bill goes up by up to 35% on identical inputs.</p><p>Until yesterday I treated token spend as a fixed cost of doing business. Opus 4.7 reframed it for me. When the same workload suddenly costs a third more, you stop thinking about usage and start thinking about <strong>waste management</strong>: which turns are productive, which ones are leaking money, and how to stop the leaks without kneecapping the agent. That is a real discipline. I had been ignoring it.</p><p>So I finally audited where my agents were actually burning money. I classified 133,087 assistant turns across 9,667 real Claude Code sessions for $19 total. The answer wasn&#8217;t what I expected, and it changed what I ship. This post is a walkthrough of what I found, what the research says about efficiency more broadly, and what token waste management looks like in practice, both the free version and the shortcut.</p><p>If you haven&#8217;t tried building serious automation on Claude Code yet, my <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">beginner agent guide</a> is a gentler entry point. If you have, keep reading.</p><h2>Token waste management is two-sided</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3Jzq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3Jzq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 424w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 848w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3Jzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png" width="1290" height="831" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:831,&quot;width&quot;:1290,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:158545,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3Jzq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 424w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 848w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 1272w, https://substackcdn.com/image/fetch/$s_!3Jzq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F017e4b49-62ed-4fd4-9a55-39f024c41b69_1290x831.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are two kinds of token bleeding. Most people only talk about one.</p><p>Side one is <strong>waste</strong>. The agent retries a failed tool call. It re-reads a file it already read. It gets stuck in a Cloudflare wall. It spawns a subagent whose output is never used. These are turns you paid for that produced nothing useful.</p><p>Side two is <strong>inefficient usage</strong>. Your CLAUDE.md is 8,000 tokens when 2,000 would do. Your system prompt repeats itself. You ask for &#8220;be concise&#8221; and the model gives you three paragraphs anyway. You don&#8217;t use prompt caching, so every turn pays the full input cost. The turns were productive, but more expensive than they needed to be.</p><p>With Opus 4.7&#8217;s tokenizer, side two just got 35% worse without anyone touching their code. If you were already on the edge of comfortable costs, you&#8217;re over it now. And the cache write cost also scales with those same tokens, so the first turn after a cache miss feels worse than you remember.</p><h2>What I measured</h2><p>I built a token waste sorter. It walks every Claude Code session JSONL and sorts each assistant turn into one of nine bins: productive, retry_error, cache_read, cache_write, reasoning, file_reread, oververbose_edit, dead_end, subagent_overhead. Seven bins are heuristic (no LLM). Two need a judge.</p><p>For the judge, I tried three models on the same 20 sessions where I knew dead ends existed:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!guhK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!guhK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 424w, https://substackcdn.com/image/fetch/$s_!guhK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 848w, https://substackcdn.com/image/fetch/$s_!guhK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 1272w, https://substackcdn.com/image/fetch/$s_!guhK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!guhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png" width="778" height="140" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:140,&quot;width&quot;:778,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!guhK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 424w, https://substackcdn.com/image/fetch/$s_!guhK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 848w, https://substackcdn.com/image/fetch/$s_!guhK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 1272w, https://substackcdn.com/image/fetch/$s_!guhK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe73dc3c6-8220-4f55-afdd-640c43640e2f_778x140.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Haiku was the clear winner. Sonnet at five times the price caught half as much. The local 4B model only caught explicit failures (blocked fetches, 403s) and missed everything that requires judging intent, like an agent searching the wrong platform for 28 straight turns. (<a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">More on why local LLMs struggle with judgment tasks here.</a>) The full audit of 9,667 sessions via OpenRouter Haiku cost me $19. That&#8217;s the cheapest observability I&#8217;ve ever bought.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pWTU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pWTU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 424w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 848w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 1272w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pWTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png" width="726" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:726,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pWTU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 424w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 848w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 1272w, https://substackcdn.com/image/fetch/$s_!pWTU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5b1c91f-8f9a-4406-8cb1-6ac0562e0559_726x876.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/token-waste-management-opus-47-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/token-waste-management-opus-47-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><p>Top five waste clusters across all sessions:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LUV6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LUV6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 424w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 848w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 1272w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LUV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png" width="777" height="205" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:205,&quot;width&quot;:777,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LUV6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 424w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 848w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 1272w, https://substackcdn.com/image/fetch/$s_!LUV6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94ffc875-95ba-4006-b9f4-68ab43ca1f91_777x205.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The surprise was the distribution. When I sampled only expensive sessions, Browser/Playwright showed up 5 times. On the full corpus it was 136. A 27x increase. The failure is spread thin across thousands of cheap cron and wake sessions, each one invisible individually, collectively the top bug. If you only audit your expensive sessions, you&#8217;ll miss this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-blO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-blO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 424w, https://substackcdn.com/image/fetch/$s_!-blO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 848w, https://substackcdn.com/image/fetch/$s_!-blO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 1272w, https://substackcdn.com/image/fetch/$s_!-blO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-blO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png" width="1254" height="318" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:318,&quot;width&quot;:1254,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51895,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-blO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 424w, https://substackcdn.com/image/fetch/$s_!-blO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 848w, https://substackcdn.com/image/fetch/$s_!-blO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 1272w, https://substackcdn.com/image/fetch/$s_!-blO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f77394f-53b5-4b5c-b93b-5a55c73c485b_1254x318.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>None of these are &#8220;AI going down wrong paths&#8221; in the romantic sense. They&#8217;re infrastructure bugs. Stale cookies. Cloudflare walls. Tools that don&#8217;t exist in the current Claude Code version. Platform confusion. The AI is the messenger, not the source. (<a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">I wrote about the compounding value of fixing these in this earlier post</a>: one small fix applied across thousands of sessions is where real gains live.)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1MFY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1MFY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 424w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 848w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 1272w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1MFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png" width="588" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:588,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46077,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194516737?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1MFY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 424w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 848w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 1272w, https://substackcdn.com/image/fetch/$s_!1MFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febf080b0-e0cc-49af-8784-f94bce95eeae_588x641.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">This is one of my &#8220;nerd&#8221; posts. If you like it - subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What the research says about the other half</h2><p>I went looking for academic and production data on cutting token usage, not just waste. Four things stood out:</p><p><strong>Prompt compression is real and large.</strong> Microsoft&#8217;s <a href="https://arxiv.org/abs/2310.05736">LLMLingua</a> and <a href="https://llmlingua.com/llmlingua2.html">LLMLingua-2</a> compress prompts 14-20x with around 1.5% quality loss. Your 7,000-token system prompt becomes 500 tokens with negligible quality drop on standard tasks. You don&#8217;t need to apply LLMLingua to use the insight: prompts have a lot of slack in them.</p><p><strong>System prompt bloat hurts quality, not just cost.</strong> Red Hat&#8217;s <a href="https://developers.redhat.com/articles/2026/02/23/prompt-engineering-big-vs-small-prompts-ai-agents">analysis</a> and the <a href="https://mlops.community/the-impact-of-prompt-bloat-on-llm-output-quality/">MLOps Community writeup</a> both land in the same place: prompts degrade quality around 3,000 tokens. Smaller, well-written system prompts outperform larger ones, and not just on latency. If your CLAUDE.md is multiple pages, it&#8217;s probably actively making the agent worse.</p><p><strong>Prompt caching is a 90% discount if you use it correctly.</strong> Anthropic&#8217;s <a href="https://platform.claude.com/docs/en/build-with-claude/prompt-caching">prompt caching</a> reduces cache-hit tokens to 0.1x the normal input price. To benefit, keep stable rules at the top of your context. Don&#8217;t reorder them mid-session. Put volatile, per-task content at the bottom. For Opus 4.7 the minimum cacheable length is 4,096 tokens, so small prompts can&#8217;t cache. Design for it.</p><p><strong>Long chains of thought do not always win.</strong> Recent work (<a href="https://openreview.net/pdf?id=W8dxn7hBkO">&#8220;overthinking&#8221; studies</a>) shows that on simple tasks, longer reasoning actively hurts performance. Production rule: use CoT for complex problems, direct answers for classification and retrieval. If you&#8217;re defaulting to &#8220;think step by step&#8221; on everything, you&#8217;re paying 3-5x tokens for a quality hit on half of them.</p><p>Add all four up and you have the other half of the story. Not every inefficiency is a bug. Most of it is prompt shape.</p><p>If you&#8217;re curious how different AI coding harnesses handle this stuff, my <a href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026">comparison of Claude Code vs Codex vs Aider vs OpenCode vs Cursor</a> goes deep on the efficiency differences between them. Short version: the harness matters almost as much as the model.</p><h2>Three things you can do today, free</h2><p>Before anything else, do these:</p><p><strong>1. Shrink your CLAUDE.md.</strong> Open it. If it&#8217;s over 3,000 tokens, you have room to cut. Move stable rules to the top (for cache hits). Kill anything that describes what Claude Code can already do. Kill historical notes that don&#8217;t change behavior. A tight CLAUDE.md both costs less AND makes the agent smarter.</p><p><strong>2. Set max_tokens tight and request structured output where possible.</strong> For classification tasks, request JSON with a schema. For quick answers, say &#8220;reply in under 50 words.&#8221; The model will drift long if you don&#8217;t put a number on it.</p><p><strong>3. Audit your WebFetch and browser failures.</strong> If you have any agent that does repeated web automation, find out if it&#8217;s hitting the same Cloudflare wall 100 times a week silently. The cost per hit is small. The total is not. For me this one cluster was $220 of silent monthly spend before I saw it.</p><p>These three alone will cut most users&#8217; bills 20-40%, at zero software cost.</p><h2>The deeper thing: the Agent Efficiency Kit</h2><p>Once I saw the clusters, the fixes were obvious but tedious: write a hook that denies redundant file reads. Write a hook that suggests firecrawl when WebFetch hits Cloudflare. Write a circuit breaker that stops the retry spiral after two failures on the same URL. Write agent-level instructions so the model internalizes the patterns. Build a dashboard so you can see what changed.</p><p>I did all of that. (The dashboard is built on the same principles as my <a href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026">WizBoard interface for agents</a>: don&#8217;t make the human hunt for the number, put it on screen.) Then I realized every Claude Code user in the world needs the same thing, and almost none of them are going to build it themselves. So I packaged it.</p><p>The <strong><a href="https://wiz.jock.pl/store/agent-efficiency-kit">Agent Efficiency Kit</a></strong> is a $49.99 drop-in package. It includes:</p><ul><li><p><strong>Three pre-wired hooks</strong> that run in your Claude Code settings: a file-reread guard, a WebFetch fallback hint, and a WebFetch circuit breaker. Script-based, zero ongoing AI cost, milliseconds of overhead per tool call.</p></li><li><p><strong>AGENT_INSTRUCTIONS.md</strong>, an approximately 1,000-token drop-in for your CLAUDE.md that tells the agent which patterns to follow and which to avoid. Cacheable, so you pay for it once per session at most.</p></li><li><p><strong>The taxonomy, classifier, and dashboard</strong> I used for the audit. Run them any time, on your own data, locally. The dashboard is a pinned tab.</p></li><li><p><strong>Optional Haiku-powered deep audit</strong>. If you want to classify a year of history for around $20 in OpenRouter credits, the scripts are ready to run.</p></li><li><p><strong>12 months of updates</strong>: new hook patterns, taxonomy expansions, dashboard features.</p></li></ul><p>It installs in one command. It doesn&#8217;t charge you tokens to measure itself. It works from the moment you restart Claude Code. You can read every file in the kit before running it, which is the version of trust I prefer.</p><p>My Paid subscribers get it for gree here: <a href="https://wiz.jock.pl/store/agent-efficiency-kit">wiz.jock.pl/store/agent-efficiency-kit</a>.</p><h2>The meta lesson</h2><p>Before Opus 4.7, token efficiency was a nice-to-have. After Opus 4.7, it&#8217;s a 35% forced haircut on everyone running on the frontier. The teams that measure their agents now will notice the bump, correct it, and keep going. The teams that don&#8217;t will slowly wonder why their AI bill is up and their features aren&#8217;t shipping faster.</p><p>The path to cheaper, better agents isn&#8217;t a smarter model. It&#8217;s better plumbing around the model. Old cookies, Cloudflare walls, a regex that didn&#8217;t sanitize a search term. These are the things that eat your budget. They stay invisible until you measure, at which point they&#8217;re obvious. Measure. Fix the top cluster. Repeat.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>If you&#8217;ve read this far, you already have enough to start on the free side. If you want the shortcut, <a href="https://wiz.jock.pl/store/agent-efficiency-kit">the kit is there</a>. Either way, now is the moment. Tokens cost more tomorrow than they did yesterday.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[Claude Code vs Codex CLI vs Aider vs OpenCode vs Pi vs Cursor: Which AI Coding Harness Actually Works Without You?]]></title><description><![CDATA[TL:TR I love Pi, but I can't use it.]]></description><link>https://thoughts.jock.pl/p/ai-coding-harness-agents-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/ai-coding-harness-agents-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Wed, 15 Apr 2026 12:50:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!FmiU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FmiU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FmiU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FmiU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4626619,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FmiU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!FmiU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7d73c8-2213-4d55-8161-73a608810d27_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>My AI agent <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">wakes up at 2am, picks tasks from a queue, ships code, and sends me a report by morning</a>. For that to work, I need a coding harness I can trust when I&#8217;m not watching.</p><p>Not a tool that helps me code faster. A tool that codes when I&#8217;m asleep.</p><p>That&#8217;s a different question than &#8220;which IDE is best.&#8221; IDEs are for humans who are present. Harnesses are for when you&#8217;re not. It&#8217;s also not the same question as &#8220;which has the best autocomplete.&#8221; That&#8217;s a different category entirely, one we&#8217;re not touching here.</p><p>I&#8217;ve used Claude Code daily for months, run Codex CLI and OpenCode in parallel, tested Pi, and dug into the open-source alternatives. This is what I actually think.</p><div><hr></div><h2>What a Harness Actually Is</h2><p>A harness connects the horse to the cart. In AI coding, it&#8217;s the set of tools and environment in which the agent operates.</p><p>Here&#8217;s the thing most people miss: LLMs can only generate text. That&#8217;s it. They can&#8217;t read your files, run commands, or edit code directly. What a harness does is give the model structured tool calls it can emit as text. The harness intercepts those, executes them with real code, appends the output to the conversation history, and prompts the model to continue. Every tool call follows the same loop: model pauses, harness runs something, result added to context, model restarts. At its core this is about 60-75 lines of Python. The complexity is entirely in the tuning: what tools the model gets, how those tools are described, and what the system prompt says.</p><p>This matters because the tuning is where harnesses actually diverge. Two harnesses running the same model on the same task can produce dramatically different results. Not because of the model, but because of what the harness tells the model it can do and how to use it.</p><p>Tab autocomplete isn&#8217;t a harness. It&#8217;s a suggestion box. A nice UI on top of an existing harness (like T3 Code, which wraps Claude Code and Codex CLI) is also not a harness. The real question for every tool below: can it take a task, execute it end-to-end across multiple files, handle errors, and report back without me in the loop?</p><div><hr></div><h2>Two Different Categories: Coding Tools vs Agent Orchestrators</h2><p>Before comparing specific tools, it&#8217;s worth naming the split that most comparisons ignore. Not all &#8220;AI coding harnesses&#8221; are trying to do the same thing.</p><p><strong>Coding tools</strong> are pair programmers. You direct each step. They execute that step very well, commit the result, and wait for the next instruction. Aider is the clearest example. Codex CLI leans this way too. Cline. These are tools built around the assumption that you&#8217;re at the keyboard and providing direction. They make individual tasks faster and better. They&#8217;re not designed to chain 40 decisions together autonomously while you sleep.</p><p><strong>Agent orchestrators</strong> are designed to take a goal and execute autonomously across multiple steps, files, and decision points. Claude Code is built for this. Devin is the extreme version. Pi, if you build out the harness fully, fits here. These tools are designed around the assumption that you&#8217;re not watching, and they need to make judgment calls without asking.</p><p>Most comparisons treat all of these as the same thing and rank them on the same axis. That produces misleading results. Aider isn&#8217;t trying to replace Claude Code for overnight autonomous runs. Codex CLI isn&#8217;t trying to be an agent orchestrator in the same sense Claude Code is. Judging them by the same criteria produces noise.</p><p>The honest answer to &#8220;which is best&#8221; depends entirely on which category you need. This post tries to be clear about which tools belong where, and let you make the call for your workflow.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Benchmark Reality (And Why It Doesn&#8217;t Tell the Full Story)</h2><p>SWE-bench Verified became the standard benchmark for this category. It measures how often a coding agent independently resolves real GitHub issues from start to finish. That status also made it a target. Researchers flagged contamination: training data for newer models overlaps with the test set, which inflates scores. The cleaner alternative is <strong>SWE-bench Pro</strong>, introduced in 2026, with 2,000+ problems that weren&#8217;t in any public training data. GPT-5.4-Codex leads there at 56.8%. Harder problems, more honest scores.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vQsd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vQsd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 424w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 848w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 1272w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vQsd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png" width="779" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:779,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61655,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vQsd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 424w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 848w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 1272w, https://substackcdn.com/image/fetch/$s_!vQsd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb96ed99-21df-4808-b1a4-5871c2e8b9d5_779x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Terminal-Bench 2.0 deserves a separate mention because it&#8217;s more relevant for agentic tasks than SWE-bench. It tests autonomous, multi-step execution in real terminal environments. Not just code edits. Actual shell navigation, file management, running commands in sequence, recovering from errors. The Claude Code harness configuration benchmarked here (&#8221;Claude Mythos&#8221;) hits 92.1%. Codex CLI hits 77.3%. That 15-point gap is a better signal for overnight autonomous work than SWE-bench numbers.</p><p>Now the result that breaks the &#8220;pick the highest number&#8221; logic. Matt Mayer ran an independent test comparing the same model inside different harnesses. Claude Opus: 77% in Claude Code, 93% in Cursor. Same model. Same tasks. 16 percentage points from the harness alone. That&#8217;s not an outlier. CORE-Bench found Claude Opus at 42% with a minimal scaffold, rising to 78% inside Claude Code&#8217;s full harness. Across multiple independent studies the harness effect ranges from 5 to 40 percentage points depending on model and task type.</p><p>A few flags before reading the tool sections. Cursor doesn&#8217;t publish SWE-bench Verified results and uses its own proprietary CursorBench at 61.3% instead. Draw your own conclusions. OpenCode and Pi have no published scores because their performance is entirely model-dependent. Devin&#8217;s frequently cited 13.86% figure is from 2023 and belongs in a museum. It does not appear in the current top 30 of any major leaderboard.</p><p>What the scores actually tell you: harness quality matters as much as the model you put in it. Cursor employs people whose full-time job is to rewrite system prompts and tool descriptions every time a new model ships. Claude will keep using a tool you label &#8220;deprecated.&#8221; Gemini will abandon structured tools entirely and only use bash. Cursor tests obsessively and adjusts. Most harnesses don&#8217;t. Keep this in mind across every section below.</p><div><hr></div><h2>Claude Code: The Deep Harness</h2><p><em>Category: Agent orchestrator | <a href="https://code.claude.com/">code.claude.com</a> | <a href="https://github.com/anthropics/claude-code">GitHub (114k stars)</a></em></p><p>Full disclosure: this is what I use daily, and what runs <a href="https://thoughts.jock.pl/p/ai-agent-self-extending-self-fixing-wiz-rebuild-technical-deep-dive-2026">Wiz</a> on a headless Mac Mini overnight. I try to be honest about it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MWqB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MWqB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 424w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 848w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 1272w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MWqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png" width="1456" height="776" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:94729,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MWqB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 424w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 848w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 1272w, https://substackcdn.com/image/fetch/$s_!MWqB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef7e7215-5bda-47fb-a788-05b19e724227_1576x840.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Claude Code is the most complete agentic runtime available right now. It reads <code>CLAUDE.md</code>, a project-specific instruction file that persists across every session. You can describe your entire architecture, your preferences, your forbidden patterns, and the agent carries that into every run without you repeating it. It has Agent Teams for spinning up parallel sub-agents that coordinate on a shared goal. As of March 2026, computer use means it can point and click through UIs, take screenshots, and handle workflows that resist scripting.</p><p>The thing <a href="https://thoughts.jock.pl/p/the-compounding-agent-ep4">I keep noticing with Claude Code</a> is that it genuinely builds on context over time. A session that starts with &#8220;add authentication&#8221; will remember the decisions it made about your auth architecture when it gets to &#8220;add rate limiting&#8221; three steps later. That coherence across a long task chain is what makes it feel like an agent rather than a very fast typist.</p><p>One important thing about how any harness uses context: the model only knows what&#8217;s in its conversation history. When Claude Code opens your project, it doesn&#8217;t already know your codebase. It explores via tool calls, building context incrementally. <code>CLAUDE.md</code> front-loads that context so fewer tool calls are wasted on discovery. Dumping your entire codebase into context (the old Repomix approach) is the wrong answer. Past around 50-100k tokens, model accuracy drops significantly. More context makes models dumber past a threshold. Good harnesses build context as needed, not all at once.</p><p><strong>Where it struggles:</strong> context loss on sessions longer than 2 hours, where it starts forgetting early decisions. Terminal-only interface has a real learning curve. Token consumption is 3-4x higher than Codex CLI per equivalent task, which compounds on long autonomous sessions.</p><p><strong>Best for:</strong> complex multi-file tasks, overnight autonomous runs, architecture-level changes that require consistent context across many steps.</p><p><strong>Pricing:</strong> Claude Pro ($20/mo) or Max ($100+/mo). For regular autonomous sessions, Max is almost certainly necessary. The per-token costs on long runs add up fast. For a detailed Claude Code vs Codex head-to-head from two months of real usage, <a href="https://thoughts.jock.pl/p/claude-code-vs-codex-real-comparison-2026">I covered that comparison separately</a>.</p><div><hr></div><h2>Codex CLI: Good, But Not What the Hype Says</h2><p><em>Category: Coding tool, emerging agent | <a href="https://openai.com/codex/">openai.com/codex</a> | <a href="https://github.com/openai/codex">GitHub (67k stars)</a></em></p><p>Codex CLI is not the old Codex model from 2021. It&#8217;s OpenAI&#8217;s terminal-based agent, open-source on GitHub, bundled with ChatGPT Plus or Pro, running on GPT-5.4. The benchmark puts it at 77.3% on SWE-bench, close to Claude Code&#8217;s 80.8%, and at 3-4x lower token cost. On paper, a strong contender.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wxrj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wxrj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 424w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 848w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 1272w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wxrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png" width="1270" height="816" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:816,&quot;width&quot;:1270,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82620,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wxrj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 424w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 848w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 1272w, https://substackcdn.com/image/fetch/$s_!wxrj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9aeecd0c-ed74-4c5c-8d48-3599dbae40b6_1270x816.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In practice, my honest read: it&#8217;s cold. That&#8217;s the right word. What I mean is that Codex CLI feels raw as an agent. It executes individual steps cleanly, but it doesn&#8217;t feel like it&#8217;s building toward something the way Claude Code does. Give it a multi-step task: add this feature, connect it to this other component, update the tests. It handles step one well, sometimes step two, and starts losing coherence by step three or four. It restates what it did, asks for clarification it shouldn&#8217;t need, or misses a dependency it should have caught from context it already has. That gap between 77.3% and 80.8% is exactly this: Claude Code holds context through longer chains.</p><p>Where Codex CLI genuinely shines is raw coding quality on focused tasks. iOS apps, macOS apps, web apps. Give it a specific, contained task and GPT-5.4 is excellent. The code quality on front-end work, app scaffolding, and UI logic is strong. I&#8217;d put it on par with or ahead of Claude Sonnet for this category of work. It&#8217;s not the harness that&#8217;s the advantage there. It&#8217;s GPT-5.4 being particularly strong at app development.</p><p>The architectural difference worth knowing: Codex CLI runs in cloud containers managed by OpenAI, not on your local machine. You can fire off a task and disconnect. The task keeps running without your terminal staying open. For batch work and overnight jobs where you&#8217;re not monitoring, that&#8217;s genuinely useful. For tight local loops where your environment variables and local state matter, you&#8217;re working around the sandboxing.</p><p><strong>Where it struggles:</strong> multi-step agentic chains with dependencies. Feels unfinished as a full harness compared to Claude Code. Less context coherence on complex tasks.</p><p><strong>Best for:</strong> focused coding tasks (especially apps), token-efficient runs, developers already on ChatGPT Plus who want to try a CLI agent without extra cost.</p><p><strong>Pricing:</strong> included with ChatGPT Plus ($20/mo) or Pro ($200/mo). If you&#8217;re already paying for ChatGPT, this is essentially free to try.</p><div><hr></div><h2>Aider: The Underrated Open-Source Standard</h2><p><em>Category: Coding tool (pair programmer) | <a href="https://aider.chat/">aider.chat</a> | <a href="https://github.com/Aider-AI/aider">GitHub (43k stars)</a></em></p><p>Aider is the tool most people in the &#8220;AI coding&#8221; conversation have never used, and it has 43,000 GitHub stars and 15 billion tokens processed per week in production. That&#8217;s not a toy project.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n4U5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n4U5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 424w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 848w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 1272w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n4U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png" width="647" height="475" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:475,&quot;width&quot;:647,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:82641,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n4U5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 424w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 848w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 1272w, https://substackcdn.com/image/fetch/$s_!n4U5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a2bbde-b8a5-4c4e-978f-4f4b4c0ebbd4_647x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The model is fundamentally different from Claude Code or Codex. Aider is a git-first pair programmer, not an autonomous orchestrator. You bring your own model, Claude Sonnet, GPT-5, Gemini 2.5, DeepSeek, Qwen, local Ollama, and Aider wraps it with git-native execution. Every AI edit becomes a commit. The repo map gives it structural understanding of your whole codebase before it touches anything. It auto-lints and runs tests after every change, self-fixing detected issues before reporting back.</p><p>The token efficiency is striking: 4.2x fewer tokens than Claude Code per equivalent task. If you&#8217;re paying for API access directly, Aider with Claude Sonnet is the most cost-efficient path to serious coding automation by a wide margin.</p><p>The honest tradeoff: Aider doesn&#8217;t orchestrate across 40 files and coordinate sub-agents. It executes a task, executes it well, and commits the result. It&#8217;s more like having a disciplined pair programmer who never skips a commit than a system that independently plans and executes a multi-hour architecture session. For incremental work, refactoring a module, implementing a feature, fixing a class of bugs, it&#8217;s the right tool. For overnight autonomous sessions that need to make judgment calls across large contexts: Claude Code.</p><p>The git-first philosophy deserves separate mention. Every change is committed. Your entire interaction with the agent is auditable, reversible, and reviewable inside your normal git workflow. No other tool in this list bakes that in at the same level.</p><p><strong>Best for:</strong> focused incremental work, budget setups, teams that want full audit trails, developers who want BYOM flexibility without giving up discipline.</p><p><strong>Pricing:</strong> free. You pay your model provider directly.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>OpenCode: The Provider Switcher</h2><p><em>Category: Hybrid (coding + emerging agent) | <a href="https://opencode.ai/">opencode.ai</a> | <a href="https://github.com/opencode-ai/opencode">GitHub (72k stars)</a></em></p><p>OpenCode&#8217;s value proposition is breadth: 75+ LLM providers, all accessible from the same interface. Anthropic, OpenAI, Google, DeepSeek, AWS Bedrock, Azure, local Ollama, and more. I&#8217;ve used it with Claude Opus, GPT models, and open-weight models like Qwen and GLM. The switching experience is genuinely seamless in a way that nothing else matches. One command, different provider, same workflow. You can&#8217;t do that in Claude Code or Codex.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T-HV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T-HV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 424w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 848w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 1272w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T-HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png" width="1456" height="974" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:974,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:231578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T-HV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 424w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 848w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 1272w, https://substackcdn.com/image/fetch/$s_!T-HV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F552e400d-12a1-4911-aa5a-588d3ed73efb_3636x2432.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But I&#8217;ll be honest about something: there&#8217;s something missing from the experience. It&#8217;s hard to name exactly. After using it alongside Claude Code for a while, I notice OpenCode doesn&#8217;t feel like it&#8217;s building a working relationship with your project. There&#8217;s no <code>CLAUDE.md</code> equivalent that persists project context. There&#8217;s no Agent Teams layer for coordinating parallel work. The autonomous behavior is functional but less mature. It handles individual tasks well, but it doesn&#8217;t feel like a system designed for extended unattended operation.</p><p>With open-weight models like Qwen and GLM, it&#8217;s fine. Gets the job done for straightforward tasks. You&#8217;re not going to get Claude Opus-level reasoning, but for routine edits and quick fixes, the cost savings are real.</p><p>The provider switching is genuinely the killer feature. If you&#8217;re doing model experiments, comparing how GPT-5.4 handles a task vs Claude Sonnet vs a local Qwen, OpenCode is the tool for that. If you already have subscriptions to multiple providers and want to use them without managing separate CLI tools, OpenCode is the right architecture. But for a long-term primary agent setup where you need consistent, deep project context: I&#8217;d reach for something else.</p><p><strong>Best for:</strong> model experimentation, teams with multiple provider subscriptions, privacy-first setups with local Ollama, cost arbitrage across providers.</p><p><strong>Pricing:</strong> free. BYOM.</p><div><hr></div><h2>Pi: The One I Actually Want to Use More</h2><p><em>Category: Coding tool + primitives harness | <a href="https://pi.dev/">pi.dev</a> | <a href="https://github.com/badlogic/pi-mono">GitHub</a></em></p><p>Pi is genuinely different from everything else here, and I want to say this upfront: I like it. It&#8217;s fast, it&#8217;s flexible, and the experience is clean in a way proprietary tools often aren&#8217;t. If I could choose without constraints, Pi is probably the closest thing to what I&#8217;d want as a daily harness alternative to Claude Code.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yEyT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yEyT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 424w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 848w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 1272w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yEyT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png" width="1456" height="134" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:134,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yEyT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 424w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 848w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 1272w, https://substackcdn.com/image/fetch/$s_!yEyT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66f27329-14d2-45eb-84cb-c77f93107b9c_3714x342.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The design philosophy is the opposite of the &#8220;more features&#8221; trend. Its tagline is blunt: &#8220;there are many coding agents, but this one is mine.&#8221; Instead of an opinionated harness, it gives you primitives. A minimal core you configure yourself. Terminal TUI, 15+ LLM providers, tree-structured session history you can navigate and export, and four operation modes. The interesting one for builders: RPC mode. Pi runs as an embeddable subprocess inside a larger automation system. Your orchestration layer calls Pi, it executes the coding task, returns structured output. Designed to be a component in a system, not a standalone tool.</p><p>What&#8217;s deliberately absent: sub-agents, plan mode, permission popups, background processes. Pi&#8217;s bet is that most harnesses embed too many assumptions about your workflow. Strip to primitives, ship extensions via npm, build exactly what you need. AGENTS.md and SYSTEM.md play the same role CLAUDE.md does in Claude Code.</p><p>So why am I not using it more? One reason, and it&#8217;s a real one: <strong>Anthropic&#8217;s billing doesn&#8217;t let you bring your Max subscription to third-party harnesses.</strong></p><p>Pi is BYOM, bring your own API key. When I tested it with Claude, Pi surfaced a message explicitly: usage through Pi counts against API billing, not your Claude subscription. So if you&#8217;re on Claude Max ($100+/mo), using Pi with Claude means paying twice. The Max subscription for Claude Code, and API rates on top for Pi. Those costs add up fast on any serious coding session. I was paying from my own pocket to test something I wanted to use more. That&#8217;s not a good feeling.</p><p>This isn&#8217;t Pi&#8217;s fault. It&#8217;s Anthropic&#8217;s policy. They don&#8217;t allow third-party harnesses to draw on subscription credits. You have to use Claude Code to get what you&#8217;re paying for on the subscription. Google does the same with Gemini. Theo from T3 made this point in a recent video on harnesses: if you&#8217;re paying $200/month for Opus, you have to use their harness. OpenAI, by contrast, lets your API credits work across third-party tools freely.</p><p>In a world where Anthropic changed this, where your Max subscription applied to any MCP-compatible harness, Pi is probably what I&#8217;d reach for first. The speed, the flexibility, the primitives-first design: it fits the kind of automation system I&#8217;m building. But until that policy changes, the economics don&#8217;t work for anyone on a Claude subscription. You pay for Claude twice if you want to experiment with a different harness.</p><p>If you&#8217;re on GPT or open-weight models (Qwen, DeepSeek, GLM), Pi has none of these constraints. The billing goes through OpenAI or your provider directly. For a Claude-first setup: this is the wall you&#8217;ll hit.</p><p><strong>Best for:</strong> GPT or open-weight model setups, building custom harness architectures, embedding a coding agent as a subprocess in larger systems, developers who want full control with no opinions baked in.</p><p><strong>Not ideal for:</strong> Claude-first developers on Max. You&#8217;ll pay API rates on top of your subscription.</p><p><strong>Pricing:</strong> free, MIT license. BYOM. Factor in API costs if using Anthropic models.</p><div><hr></div><h2>Cursor: The Best Supervised Experience, Not Yet a Harness</h2><p><em>Category: IDE with supervised agent mode | <a href="https://cursor.com/">cursor.com</a></em></p><p>Cursor is an IDE first. Its agent mode deserves inclusion in this conversation because of how fast the direction is changing, not because it&#8217;s a harness today.</p><p>Cursor 3 (released April 2026) added cloud agents on isolated VMs, <code>/worktree</code> for isolated branch changes, self-hosted agents, and parallel Agent Tabs. 30% of Cursor&#8217;s own internal PRs are now agent-made. The supervised IDE experience, Design Mode where you annotate a mockup and get an implementation, parallel agents, and deep JetBrains support, is the best developer experience available at the keyboard right now.</p><p>As an overnight harness: not there. When left without supervision, it stalls at the first ambiguous decision point. That&#8217;s not a bug. It&#8217;s a design choice. Cursor is built for developers who are present and want an agent that won&#8217;t make unilateral decisions on their codebase. That&#8217;s the right call for most developers. It means Cursor isn&#8217;t the right tool for autonomous runs.</p><p>The 77% to 93% Opus benchmark is the thing worth studying. Cursor extracts more from the same model through obsessive harness tuning. People whose whole job is to rewrite system prompts and tool descriptions for each new model release. The gap is real and compounds across tasks. The cloud agents direction makes me think this section of the comparison will look very different in 12 months.</p><p><strong>Best for:</strong> daily supervised coding, developers who want the best IDE-plus-agent experience at the keyboard.</p><p><strong>Pricing:</strong> Hobby (free), Pro ($20/mo), Ultra ($200/mo), Teams ($40/user/mo).</p><div><hr></div><h2>A Few More Worth Knowing</h2><p><strong><a href="https://goose-docs.ai/">Goose</a> (Block/Square, <a href="https://github.com/block/goose">GitHub, 41k stars</a>):</strong> Open-source, MCP-based, general-purpose agent. Not coding-specific, but handles code tasks well. Right fit if you want automation that goes beyond coding into broader workflows. Apache 2.0 license.</p><p><strong><a href="https://cline.bot/">Cline</a> (<a href="https://github.com/cline/cline">GitHub, 60k stars</a>):</strong> Open-source, supports VS Code, JetBrains, Neovim, Emacs. Widest multi-IDE coverage of any tool in this list. Good MCP support. Worth looking at if your stack spans multiple editors.</p><p><strong><a href="https://geminicli.com/">Gemini CLI</a> (Google, <a href="https://github.com/google-gemini/gemini-cli">GitHub, 96k stars</a>):</strong> Free with a Google account. 60 requests/minute, 1,000/day, 1 million token context window. Genuinely generous free tier. Strong on frontend tasks. The right starting point if budget is the hard constraint and you don&#8217;t have API credits elsewhere.</p><p><strong><a href="https://devin.ai/">Devin</a> (Cognition):</strong> Full autonomy, cloud sandbox, Linux shell, browser. Significantly more accessible than before: Core tier at $20/mo plus $2.25 per ACU (autonomous compute unit). Resolves 13.86% of real GitHub issues end-to-end, a dramatic improvement over what was possible two years ago. Worth evaluating for teams with consistent engineering backlogs, not just enterprise anymore.</p><p><strong><a href="https://github.com/pingdotgg/t3code">T3 Code</a> (Theo):</strong> Not a harness. A UI wrapper on top of Claude Code and Codex CLI. Useful to name because it comes up in these conversations. If you don&#8217;t have Claude Code installed, T3 Code won&#8217;t do Claude tasks. The UI is the product, not the agent.</p><div><hr></div><h2>Same Task, Different Harness</h2><p>The fairest way to compare these is to run the same type of task and watch what happens. Here&#8217;s the pattern I kept seeing:</p><p><strong>Complex multi-step agent task (e.g. &#8220;add this feature, connect it to the auth system, update the affected tests, write a changelog entry&#8221;):</strong> Claude Code holds the chain. It remembers what it did in step one when it reaches step four. Codex CLI starts strong and starts fraying around step three. OpenCode and Aider handle each step well in isolation, but need more direction between steps.</p><p><strong>Focused app development (iOS, macOS, web UI):</strong> Codex CLI with GPT-5.4 is competitive here. The code quality on app work is strong, sometimes ahead of Claude Sonnet. Claude Code with Opus is still better on complex multi-component app logic, but for a contained feature or a new screen: Codex CLI is a legitimate choice.</p><p><strong>Budget-constrained incremental refactoring:</strong> Aider with Claude Sonnet or DeepSeek is the clear call. The 4.2x token efficiency advantage is real. The git-first commit-per-change model gives you a clean audit trail. You pay for what you actually use.</p><p><strong>&#8220;I want to run the same task with three different models and compare&#8221;:</strong> OpenCode. Nothing else makes provider switching this frictionless.</p><p><strong>Overnight autonomous work where you&#8217;re not monitoring:</strong> Claude Code. The infrastructure is designed for exactly this. CLAUDE.md project context, background scheduling, Agent Teams, error handling. Everything else is built around having a human present.</p><div><hr></div><h2>Which One Fits Your Workflow?</h2><p>There&#8217;s no universally &#8220;best&#8221; harness. The honest answer depends on a few questions about how you actually work.</p><p><strong>Are you at the keyboard or not?</strong> If you&#8217;re supervising every step, Cursor gives you the best IDE experience and the most model-agnostic setup. If you want autonomous execution with no supervision, Claude Code is the only tool built end-to-end for that. Everything else sits somewhere in between.</p><p><strong>Do you need to chain many steps or execute one step well?</strong> Multi-step autonomous chains with dependencies: Claude Code. Focused, contained tasks with excellent code quality: Aider or Codex CLI. There&#8217;s a real difference between a pair programmer and an orchestrator, and the right choice depends on which problem you&#8217;re actually solving.</p><p><strong>What&#8217;s your budget?</strong> If you&#8217;re price-sensitive, Aider with a cheap backend (DeepSeek, Qwen, even Gemini) is the clearest path to real coding automation at minimal cost. Gemini CLI is free with generous limits. OpenCode lets you use whatever provider is cheapest for the task at hand. None of these require a $100/mo subscription.</p><p><strong>Do you care about model flexibility?</strong> If you want to switch between Claude, GPT, open-weight models, and local Ollama without friction, OpenCode or Aider are the right architectures. Claude Code and Codex CLI are provider-locked.</p><p><strong>Are you building a system or using a tool?</strong> If you&#8217;re assembling a larger automation where a coding agent is one component among many, Pi&#8217;s RPC mode and primitives-first design is worth the setup investment. If you just want to get code written, start with Claude Code or Aider depending on your budget and task type.</p><p>Like, the mistake most people make is picking a tool based on a benchmark and then wondering why it doesn&#8217;t feel right in their actual workflow. The benchmark measures what the model can do on a standardized task. Your workflow isn&#8217;t a standardized task.</p><div><hr></div><h2>The Decision Matrix</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_9X6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_9X6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 424w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 848w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 1272w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_9X6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png" width="1287" height="390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:390,&quot;width&quot;:1287,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104896,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194290844?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_9X6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 424w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 848w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 1272w, https://substackcdn.com/image/fetch/$s_!_9X6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff661b22f-be9c-4380-9e79-aa5166edb084_1287x390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/ai-coding-harness-agents-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/ai-coding-harness-agents-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><div><hr></div><h2>The Honest Verdict</h2><p>After months of real use, here&#8217;s where I land.</p><p><strong>Claude Code for autonomous execution.</strong> Not because it&#8217;s perfect. Context loss on sessions over 2 hours is a genuine problem, and the token cost is genuinely high. But it&#8217;s the only tool built, end to end, for the question &#8220;can I leave this running while I sleep?&#8221; Agent Teams, background scheduling, CLAUDE.md project memory, computer use. The infrastructure reflects that goal. <a href="https://thoughts.jock.pl/p/mac-mini-ai-agent-migration-headless-2026">My headless Mac Mini setup</a> runs on this for exactly this reason.</p><p><strong>Codex CLI for app work.</strong> GPT-5.4 is genuinely excellent at iOS, macOS, and web app development. For a contained feature with a clear spec, it&#8217;s fast, cheap, and produces clean code. The harness feels raw for complex agentic chains, but for the coding task itself, it earns its place.</p><p><strong>Aider for budget, discipline, and BYOM.</strong> The 4.2x token efficiency is real. The git-first model is actually better discipline than what you get from proprietary tools. If you want to run open-weight models like Qwen or DeepSeek and maintain a clean git history, Aider is the right architecture.</p><p><strong>OpenCode for model exploration.</strong> If you&#8217;re actively experimenting with providers or you have multiple subscriptions you want to use from a single interface, nothing else compares on the switching experience. But don&#8217;t expect it to replace Claude Code for sustained agent work.</p><p><strong>Pi for builders (with an asterisk).</strong> If you&#8217;re constructing a system where a coding agent is one component among many, the RPC mode and primitives-first design are genuinely the right architecture. It&#8217;s fast, it&#8217;s flexible, and if I had no constraints I&#8217;d use it far more. The asterisk: Anthropic currently doesn&#8217;t allow third-party harnesses to draw on Max subscription credits. Pi showed me this explicitly in a message during testing: API usage bills separately on top of your subscription. Until Anthropic changes that policy, Pi is most practical on GPT or open-weight models. Claude-first developers are forced to pay twice.</p><p>The deepest insight from the benchmark data is that harness tuning matters as much as model quality. Same model, different harness: 16 percentage points (77% &#8594; 93%, Opus, Claude Code vs Cursor). Multiple independent studies show a 5-40 point range from harness quality alone. If results from any of these tools feel inconsistent, the harness is the first place to look: system prompt, tool descriptions, context management. Not the model. For autonomous overnight work specifically, look at Terminal-Bench 2.0, not just SWE-bench. The 92.1% vs 77.3% gap between Claude Code and Codex CLI in agentic terminal tasks is a better signal for that use case than code-editing scores.</p><div><hr></div><p>One thing for paid subscribers. The most relevant store product to this post is the <a href="https://wiz.jock.pl/store/claude-code-prompts">Claude Code Prompt Pack</a>: 50+ prompts organized by task type, pulled from real overnight sessions where I needed the harness to actually work without me. If you&#8217;re on a monthly plan, you get one free product from the store per month. That&#8217;s a good pick.</p><p>If you&#8217;re on yearly, the full store is already included. If you&#8217;re still on the free plan, this is roughly what paid unlocks in practice: the store and a weekly dispatch that goes deeper than the public posts.</p><p><em>I write about building with AI agents from a practitioner&#8217;s perspective. No hype, no affiliate links. <a href="https://thoughts.jock.pl/subscribe">Subscribe here</a> if you want more of this.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[I Spent 2 Months Building Custom Software for My AI Agent. Last Week I Replaced It All.]]></title><description><![CDATA[The question was never "can I build it?" It was always "should I?"]]></description><link>https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026</link><guid isPermaLink="false">https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Mon, 13 Apr 2026 12:01:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tiIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tiIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tiIX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tiIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4117222,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tiIX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!tiIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0afcb23-d79b-441f-a0f4-f3833ac31c41_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When you start building an AI agent, it works great in the terminal. CLI conversations, Discord messages, email reports. You talk to it, it talks back, things get done. For a while, that&#8217;s enough.</p><p>Then you start building more. More automations. More projects. More things happening in the background while you sleep. Your agent <a href="https://thoughts.jock.pl/p/building-ai-agent-night-shifts-ep1">runs night shifts</a>, handles tasks across multiple channels, manages a growing list of things. And at some point you realize: you can&#8217;t see any of it. Not in a way that actually helps you think.</p><p>I could always ask my agent what&#8217;s going on. &#8220;What tasks are open? What did you do last night? What&#8217;s the status of project X?&#8221; And it would answer. Correctly, usually. But that&#8217;s not the same as seeing it. Humans need surfaces. We need to look at something, drag something, scan a board and instantly know what matters. That&#8217;s not a weakness. That&#8217;s how our brains are wired.</p><p>This is the story of how I built custom software to give my AI agent a visual interface. How that software grew, broke, and eventually taught me a lesson I should have learned earlier: the hardest question in the agent era is not whether you <em>can</em> build something. It&#8217;s whether you <em>should</em>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">I write about building AI agents, the mistakes, and what actually works. Subscribe for free and get every post.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Phase 1: Notion (worked until it didn&#8217;t)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bQP9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bQP9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 424w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 848w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 1272w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bQP9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png" width="1456" height="1292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1292,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:236619,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bQP9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 424w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 848w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 1272w, https://substackcdn.com/image/fetch/$s_!bQP9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25b592c8-c0f6-4842-9e43-b06d2c0ca694_2373x2106.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Before I built anything custom, I used Notion. <a href="https://thoughts.jock.pl/p/notion-ai-context-management-ai-ceo-system-progress-update">I wrote about that setup back in December 2025</a>. My agent could read and write to Notion databases, create tasks, update statuses. It worked. Sort of.</p><p>The problem with Notion was that it&#8217;s designed for humans organizing things manually. The API is slow. The data model is rigid in weird places and too flexible in others. I wanted specific views, specific behaviors, specific integrations that Notion simply wasn&#8217;t built for. I wanted a task to appear on a board the moment my agent starts working on it. I wanted real-time updates. I wanted the whole thing to feel like it was built for one person and one AI agent working together, because that&#8217;s exactly what it was.</p><p>So I did what any person with access to a capable AI would do in early 2026. I built my own.</p><h2>Phase 2: Building WizBoard (the fun part)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QmrV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QmrV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 424w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 848w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QmrV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:453433,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QmrV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 424w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 848w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!QmrV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51875c61-eb57-4440-a4fd-a9fc843de3c1_3524x1170.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>January and February 2026 was peak <a href="https://thoughts.jock.pl/p/vibe-coding-revolution-non-programmers-ai-software-development-2025">vibe coding</a> energy. You could describe what you wanted, and a capable AI would build it. Not a prototype. Not a mockup. A working application with a database, API, authentication, the whole thing. I described what I needed, and my agent built it.</p><p>WizBoard was a custom kanban board. FastAPI backend, SQLite database, deployed on my own server. It had everything I wanted:</p><ul><li><p>A visual board where tasks moved through columns (Backlog, Next, Now, Waiting, Done)</p></li><li><p>Real-time updates. When my agent started a CLI session, a card appeared in &#8220;Now&#8221; immediately</p></li><li><p>Deep integration with every automation. Night shift plans, day shift tasks, Discord bot commands, email reports. Everything flowed through WizBoard</p></li><li><p>Custom metadata: areas, projects, priorities, task types, queue state</p></li><li><p>Clusters, which was my attempt at grouping related tasks visually. Like a meta-layer on top of the board</p></li><li><p>Focus timers. I was tracking how long each task took, thinking I&#8217;d use the data to improve planning. I never used the data</p></li><li><p>A review flow with submit, approve, and resolve stages. My agent would finish work, submit it for review, and I&#8217;d approve or send it back</p></li><li><p>An offline queue so that when the server was down, mutations would pile up locally and replay when it came back</p></li><li><p>A 3,700-line Python API client that every script in my system imported</p></li></ul><p>It was great. I loved using it. The feeling of seeing my agent&#8217;s work appear on a board in real time, being able to drag cards, add comments, review what happened overnight. That was exactly what was missing from the CLI-only experience.</p><p>So naturally, I kept going. Web version working? Let&#8217;s build a native macOS app. SwiftUI, menu bar integration, keyboard shortcuts, drag-and-drop. Focus mode that showed one task at a time with a timer in the menu bar (because ADHD). Then an iOS version with widgets, push notifications, Live Activities. <a href="https://thoughts.jock.pl/p/wiz-1-5-ai-agent-dashboard-native-app-2026">I wrote about this too.</a> Three platforms. All custom. All built by my agent. All working.</p><p>54 commits over two months. It was genuinely fun to build. Every idea I had, I could add. &#8220;What if tasks could be grouped into clusters?&#8221; Done. &#8220;What if the menu bar showed my current focus task?&#8221; Done. &#8220;What if the iOS widget showed my top 3 priorities with live countdown?&#8221; Done. The possibilities felt endless, and that was precisely the problem.</p><h2>Phase 3: The Productivity Paradox hits home</h2><p>I wrote a whole post about <a href="https://thoughts.jock.pl/p/ai-productivity-paradox-wellbeing-agent-age-2026">the AI productivity paradox</a>. The short version: you can build so many things so fast that the bottleneck stops being technical and starts being mental. You run out of brain before you run out of capability.</p><p>WizBoard was a textbook case.</p><p>My agent was creating tasks, completing tasks, moving things between columns, posting comments, running automations. All of this showed up on my board. Every single thing. And the more capable the system became, the more things happened, and the more overwhelmed I felt looking at the board I built to reduce my overwhelm.</p><p>I wasn&#8217;t more efficient. I was drowning in my own tooling.</p><p>The obvious answer was: simplify. Strip features. Go back to basics. I tried that. And this is where the real problems started.</p><p>When you build a custom system from scratch, everything is connected in ways that are hard to see until you start pulling threads. I wanted to simplify the task model, change how statuses worked, clean up the architecture. Every change broke something else. The web version would work, but the iOS version wouldn&#8217;t. Fix that, and the automation scripts would fail because they expected the old API shape. Fix those, and the night shift planner would create tasks with wrong metadata.</p><p>I found myself spending entire sessions just fixing things I&#8217;d broken while trying to make the system simpler. That&#8217;s the trap. You&#8217;re not building anymore. You&#8217;re maintaining. And maintaining custom software across three platforms (web, macOS, iOS) with a 3,700-line API client and dozens of automation consumers is a full-time job. I don&#8217;t have a full-time job&#8217;s worth of attention for my task board.</p><p>Here&#8217;s what I mean by specific failures. During one &#8220;simplification&#8221; pass, the optimization changes made the board sluggish instead of faster. New features that seemed simple (changing how task statuses map to columns) cascaded into the API client, the automation scripts, the native app&#8217;s sync logic, and the notification system. Every platform had slightly different behavior because they were all built at different times with different assumptions.</p><p>I realized something: the code was fine. My agent writes good code. The architecture was the problem, and it was my architecture. I had designed a system that was perfectly tailored to my needs in February, and by April those needs had evolved, and the tailoring was now a constraint.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Free subscribers get every post: architecture breakdowns, migration stories, and honest takes on what breaks when you build with AI agents.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The realization: Can vs. Should</h2><p>This is the thing I want to talk about, because I think a lot of people building with AI agents are going to hit this exact wall.</p><p>When you have a capable AI agent, you can build almost anything. Custom task managers, dashboards, native apps, full-stack web applications. The <a href="https://thoughts.jock.pl/p/vibe-coding-security-reality-check-ai-apps-fast-development-nightmares">vibe coding era</a> made this feel effortless. And it kind of is, for version one. The agent builds it, it works, you use it, life is good.</p><p>I don&#8217;t hear this question very often in the excitement of version one: who maintains version twenty?</p><p>I had a working web app, a working macOS app, a working iOS app, a 3,700-line API client, fifty-plus automation scripts that all talked to this system, and a database with hundreds of tasks. All custom. All mine. All maintained by me and my agent. And every improvement required touching all of these surfaces. That&#8217;s not a system. That&#8217;s a debt.</p><p>The realization was simple: I need foundations. Real foundations. Built by people who&#8217;ve been thinking about project management software for twenty years, not by me in a weekend coding session.</p><h2>Phase 4: Finding Fizzy</h2><p>37signals has been building project management software since before most people had smartphones. Basecamp, HEY, and now Fizzy. I&#8217;ve read their books. I like how they think about software: simple, opinionated, finished. Not &#8220;feature-rich.&#8221; Finished.</p><p>One of the reasons I got into coding originally was Ruby on Rails, and <a href="https://thoughts.jock.pl/p/rediscovering-coding-joy-with-ruby">Rails is something I genuinely enjoy</a>. It&#8217;s the heart of everything 37signals builds. When they open-sourced Fizzy last year (<a href="https://github.com/basecamp/fizzy">github.com/basecamp/fizzy</a>), a simple kanban board built on modern Rails, I bookmarked it and moved on. I had my own thing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_lCD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_lCD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 424w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 848w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_lCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png" width="1456" height="936" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:936,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:144374,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_lCD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 424w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 848w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 1272w, https://substackcdn.com/image/fetch/$s_!_lCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdd1a8188-b63b-4b9d-b480-482e361977f5_1631x1048.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Last week, I came back to that bookmark.</p><p>Fizzy is, on the surface, a simple kanban board. Cards in columns. Drag them around. But the foundations are deep. Here&#8217;s what I mean:</p><ul><li><p><strong>Real architecture.</strong> Multi-tenant with URL-based account isolation. Passwordless magic-link authentication (no passwords to manage, no OAuth to configure). UUID primary keys. Proper background jobs via Solid Queue, no Redis dependency</p></li><li><p><strong>Real-time.</strong> WebSocket-driven updates. When my agent moves a card, I see it move. No refresh needed. This is something I had to build from scratch in WizBoard. Here it just works</p></li><li><p><strong>Entropy system.</strong> Cards that sit untouched for too long get auto-postponed to &#8220;not now.&#8221; This alone is worth the switch. My old board had cards that sat in Backlog for weeks, creating visual noise. Fizzy gently clears them out</p></li><li><p><strong>Steps.</strong> Checklist items on cards. This replaced my need for sub-task cards entirely</p></li><li><p><strong>Golden cards, reactions, cover images.</strong> Priority highlighting, emoji reactions, visual richness. All built in</p></li><li><p><strong>Board-level notification controls.</strong> I want notifications from my Ops board. I don&#8217;t want them from the Automations board. One toggle per board</p></li><li><p><strong>PWA.</strong> Works on mobile out of the box. Not as rich as my old native iOS app, but I don&#8217;t need widgets and Live Activities. I need to see my board and drag cards</p></li><li><p><strong>Full-text search.</strong> 16-shard MySQL search across all cards, comments, descriptions. My old SQLite setup couldn&#8217;t match this</p></li><li><p><strong>Deployable via Kamal.</strong> Docker-based zero-downtime deployment. I forked the repo, configured it for my server, and had it running in an afternoon</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LW0Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LW0Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 424w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 848w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 1272w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LW0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png" width="1456" height="949" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:949,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133017,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LW0Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 424w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 848w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 1272w, https://substackcdn.com/image/fetch/$s_!LW0Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9612cefc-1841-4815-8dad-858ce6c7bddd_1606x1047.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The critical thing: it starts simple and lets you decide how complex it gets. My old WizBoard started complex because I designed it for my specific use case from day one. Fizzy starts with a board and columns and cards. Everything else is optional. The data model is minimal: cards have tags, not separate tables for areas, projects, priorities, types, and clusters. One concept (tags with prefixes like <code>area/Automation</code> or <code>p/High</code>) replaces five database tables from my old system.</p><h2>The migration: one day, twenty-one commits</h2><p>Here&#8217;s where it gets technical, and I think this part matters because it shows how to migrate away from custom software without breaking everything that depends on it.</p><p>I had fifty-plus scripts that talked to my old WizBoard API. Night shift planners, day shift executors, Discord bot, iMessage handler, CLI session hooks, cron runners, health monitors. Rewriting all of them was not an option. I&#8217;d be right back in the maintenance trap.</p><p>The solution was a dispatcher shim. I took the 3,700-line API client and replaced it with a 94-line router. That router loads either the new Fizzy-backed client or the old legacy client, based on one environment variable. Every automation script keeps importing the same file, calling the same functions, getting the same response shapes. They don&#8217;t know anything changed.</p><p>The new Fizzy client translates everything on the fly. When a script calls <code>task_create(title="...", area="Automation")</code>, the shim creates a Fizzy card with a tag <code>area/Automation</code>. When a script reads a task back, the shim synthesizes the old data shape from Fizzy&#8217;s card, columns, and tags. Legacy integer task IDs get looked up in a translation table. The offline queue (for when the server is down) works identically.</p><p>The whole cutover happened in a single day. Twenty-one commits between 2pm and 10pm. The first commit was the shim and the new client. Then guardrails: a parity probe that runs the full lifecycle (create, tag, comment, claim, review, approve, close, delete) in under six seconds, a drift monitor that compares old and new systems every five minutes, an orphan sweeper for dead session cards.</p><p>Then the real work started: dogfooding. Using the system for real work and watching what breaks.</p><h2>What broke (and what I learned from each failure)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8e5W!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8e5W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 424w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 848w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 1272w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8e5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png" width="1456" height="1394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1394,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222474,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8e5W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 424w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 848w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 1272w, https://substackcdn.com/image/fetch/$s_!8e5W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F02f88a05-a97c-4417-9c15-5a3f5cfeec0b_1692x1620.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A lot broke. That&#8217;s expected when you swap the foundation under a running system. What matters is that every failure taught me something about assumptions I didn&#8217;t know I was making.</p><p><strong>The hard-coded URL.</strong> My session-end script had a direct URL to the old system baked into it. It bypassed the shim entirely. Every CLI session was leaving orphaned cards on the board because the completion logic was silently failing against a system that didn&#8217;t have those task IDs. I only noticed because the board was getting cluttered with cards that never closed.</p><p><strong>The cron drift bug.</strong> My automations run on macOS launchd, which doesn&#8217;t guarantee precise timing. A schedule like &#8220;every 2 minutes&#8221; assumes the system wakes up on even minutes. It doesn&#8217;t. Over time, launchd drifts to odd minutes, and the strict cron parser never matches. I had automations that fired once and then silently stopped. Fix: a 4-minute lookback window that catches drifted schedules without double-firing.</p><p><strong>The disappearing automations.</strong> This one was fun. After every successful automation run, the system closed the automation&#8217;s card. Which makes sense for tasks. Tasks finish. But automations are definitions. They run forever. &#8220;Post a greeting in different languages every 2 minutes&#8221; should cycle between Idle and Running, not disappear into Done after its first successful run. I watched one automation fire exactly once and vanish. The fix was treating automation cards as permanent residents that never close, only change columns.</p><p><strong>The comment flood.</strong> My Discord bot runs every minute. The old system handled this fine because it was designed for it. The new system faithfully logged every run as a comment on the automation card. 2,880 comments per day from one automation alone. The board became unreadable. Fix: smart gating that skips success comments for high-frequency automations (every-minute pollers don&#8217;t need a &#8220;success&#8221; note 1,440 times a day) but always logs failures.</p><p><strong>The title flip-flop.</strong> This was the most visible bug. Every time I completed a subtask during a CLI session, the system closed the session card, which triggered a self-healing mechanism that created a new &#8220;Working...&#8221; card, which then got renamed seconds later. On the board, I could see the title flickering between &#8220;Working...&#8221; and the actual title every few minutes. The fix was rethinking what &#8220;complete a subtask&#8221; means: it should add a checklist item to the existing card, not close and recreate it.</p><p>Each of these failures had the same root cause: the old system was built around one-shot tasks. The new system needed to support long-lived definitions, high-frequency automations, and multi-step sessions. Same data (cards on a board), fundamentally different lifecycle assumptions.</p><p></p><h2>What the new setup looks like</h2><p>Two boards. That&#8217;s it.</p><p><strong>Wiz Ops</strong> is my board. Tasks I care about, things I need to do or review. Columns: Triage, Next, Now, Waiting, Review, and a Queue for things I want done but not right now. When I add a card and assign it to my agent, it picks it up, does the work, leaves a comment with what it did, and moves the card to Review. When something is done, it&#8217;s done. I have notifications turned on for this board because everything here is relevant to me.</p><p><strong>Automations</strong> is my agent&#8217;s board. Each automation is one permanent card. Columns: Intake, Disabled, Idle, Running, Needs Attention. Cards never close. They cycle between Idle and Running on their schedules. If something fails, it moves to Needs Attention and stays there until someone looks at it. I have notifications turned off for this board because most of what happens here is routine. If something produces a meaningful output, it surfaces on Wiz Ops as a done card with the summary.</p><p>The Intake column is one of my favorite things. I can drop a card there with something like &#8220;Send me a weather forecast every morning at 7am&#8221; and my agent picks it up, converts it to a proper automation definition with a schedule and a prompt, and moves it to Disabled for my review. Natural language to working automation. That&#8217;s the kind of thing that&#8217;s only possible when your task board and your AI agent share the same system.</p><h3>What I kept from the old system</h3><p>The Queue concept. Sometimes you have a task that doesn&#8217;t need to happen now, but you want it queued for the next day shift or night shift. Drop it in Queue, it gets picked up at the right time. This carried over directly.</p><p>Shift summary cards. My agent creates a &#8220;Nightshift 2026-04-10&#8221; card with checklist items for each planned task. As it works through the night, it checks off items and adds notes. When I wake up, I can see exactly what happened, with context, right on the board. Same for day shifts. I still get email reports, but having it on the board means I can go back, ask questions via comments, and see the history.</p><p>Real-time CLI visibility. When I start a CLI session, a card appears in Now. When I complete pieces of work, they show up as checklist steps on that card. When the session ends, the card closes with a summary. I can watch my own work happening on the board while I&#8217;m doing it.</p><h3>What Fizzy gave me for free</h3><p>Golden cards for priority highlighting. Emoji reactions on cards. Cover images. HTML descriptions for rich content. Column colors. Board-level notification controls. &#8220;Not now&#8221; for things I want to acknowledge but not deal with. Full-text search across everything. The entropy system that auto-postpones stale cards (this alone prevents the infinite todo list problem). PWA that works well on mobile. All of this out of the box, maintained by a team that&#8217;s been building software like this for two decades.</p><p>I don&#8217;t have the macOS native app anymore. I don&#8217;t have the iOS app with widgets and Live Activities. I work in the browser now. And honestly? It&#8217;s fine. The PWA handles mobile well enough. I might build a native shell later. But the point is: I stopped spending time maintaining three custom platforms and started spending time using one good one.</p><p><em>If you want to set up something similar for your own agent, I packaged the two-board architecture, dispatcher shim, and backend adapters for Notion/Linear/REST into the <a href="https://wiz.jock.pl/store/ai-agent-interface-kit">AI Agent Interface Kit</a>. You hand the instructions to your AI agent and it builds the interface layer for you. Annual paid subscribers get it for free, as with all store products.</em></p><h2>The rollback plan (that I never needed)</h2><p>One environment variable. <code>WIZBOARD_BACKEND=legacy</code> and the entire system reverts to the old API. Every script, every automation, every hook. I kept the old 3,600-line client as a preserved rollback target. I never needed it. But knowing it was there made the migration a lot less stressful.</p><p>I also ran a parity probe every five minutes for the first few days. A script that exercises the full task lifecycle against both systems and compares results. Any drift would show up in minutes, not days. That&#8217;s the kind of safety net you need when you&#8217;re swapping foundations under a running system.</p><h2>What this means for you</h2><p>If you&#8217;re building an AI agent, or using one seriously, at some point you&#8217;re going to want a visual surface for it. Something you can look at and immediately understand what&#8217;s happening, what needs attention, and what&#8217;s going well. That&#8217;s a human need, not a technical one. AI agents are efficient in text. Humans are efficient with visuals. Both need to be true at the same time.</p><p>The good news: you have options. More than I realized when I started.</p><p><strong>The easiest path: plug your agent into something that already exists.</strong> Notion, Linear, Trello, Jira. These tools have APIs. Your agent can create tasks, update statuses, leave comments. I started here with Notion, and honestly, for a lot of people this is enough. Your agent writes to the API, you look at the board. Simple. If the tool meets your needs, stop here. Don&#8217;t build anything custom. I mean it.</p><p><strong>The middle path: fork an open-source foundation and make it yours.</strong> This is where I ended up. You get real architecture (auth, real-time, search, mobile) maintained by people who&#8217;ve been solving those problems for years, but you also get full control. You can modify the code. You can add features that make sense for your agent. You deploy it on your own server, your own rules. The custom part is the integration layer, the shim between your agent&#8217;s world and the board&#8217;s world. That&#8217;s where the magic lives.</p><p><strong>The hard path: build everything from scratch.</strong> This is where I started. I don&#8217;t regret it, because I learned a lot and I had genuine fun doing it. But I want to be honest: maintaining custom software across multiple platforms with dozens of automation consumers is a real job. Version one is almost free. Version twenty is not. If you go this route, go in with your eyes open.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Ke2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Ke2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 424w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 848w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 1272w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Ke2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png" width="1456" height="1343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1343,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/194061080?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_Ke2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 424w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 848w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 1272w, https://substackcdn.com/image/fetch/$s_!_Ke2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e567ec0-4ad6-49a4-aa3c-695c0940fdb9_1692x1561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/wizboard-fizzy-ai-agent-interface-pivot-2026?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>I&#8217;m not here to say Fizzy is the best tool for everyone. It&#8217;s the best tool for me. I like 37signals&#8217; philosophy. I like Rails. I like the minimal data model. I like that it starts simple and I can shape it to my needs without fighting the architecture. For you, the right foundation might be something completely different. Maybe it&#8217;s <a href="https://thoughts.jock.pl/p/ai-agent-self-extending-self-fixing-wiz-rebuild-technical-deep-dive-2026">a fully custom system</a> because your use case genuinely requires it. Maybe it&#8217;s Notion with a good API integration because you don&#8217;t need more than that.</p><p>The point is: think about what <em>you</em> need. Not what I have, not what looks impressive, not what you <em>could</em> build because the technology makes it possible. We don&#8217;t need a million different custom tools. We need the thing that works for us. The opportunity is huge, but the opportunity is in finding the right fit, not in building the most complex system.</p><p>Observe whether your current setup meets your expectations. If it does, keep it. If something feels off, improve it. But improve it from a solid foundation, not from a blank canvas. That&#8217;s the lesson I paid two months to learn.</p><p>My board is a fork of an open-source Rails app. The code is vanilla kanban. The magic is in the 3,200-line Python client that translates between my agent&#8217;s world (areas, projects, automations, sessions, shifts) and the board&#8217;s world (cards, columns, tags). That client is my custom software. The board is not. And that distinction made all the difference.</p><p>Build the integration. Borrow the foundation.</p><div><hr></div><p><em>The <a href="https://wiz.jock.pl/store/ai-agent-interface-kit">AI Agent Interface Kit</a> packages everything from this journey: the two-board architecture, dispatcher shim, 4 backend adapters (Notion, Linear, Fizzy, generic REST), session hooks, automation runner, and a migration checklist. You hand the instructions to your AI agent and it builds the whole interface layer. Works with any AI agent, not just mine. Annual paid subscribers get it for free, as with every product in the store.</em></p><div><hr></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">If this was useful, I write one of these every week. Free to subscribe, no spam.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Compounding Agent]]></title><description><![CDATA[Reading a leaked Claude Code source, swapping a 35B model's brain for a 4.4x speedup, and writing the beginner's guide I wish I had six months ago.]]></description><link>https://thoughts.jock.pl/p/the-compounding-agent-ep4</link><guid isPermaLink="false">https://thoughts.jock.pl/p/the-compounding-agent-ep4</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Sat, 11 Apr 2026 15:05:37 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/193557056/f90738fce050a4beb7380bbba83baa6c.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Episode four. What happens when hobbyist AI starts growing up into production AI, and how the lessons compound if you pay attention.</p><p>First, a rare look inside the pros&#8217; toolbox. Claude Code&#8217;s source got leaked. Instead of treating it like drama, I treated it like a free masterclass. Tool permission gating, risk classification, blocking budgets, memory management, multi-agent coordination, feature flags like autoDream and KAIROS. Most people building agents today are reinventing patterns that professional teams already solved. You learn more from reading one real production codebase than from ten tutorial posts.</p><p>Then, applying those lessons to my own stack. My $599 Mac Mini M4 runs a 35 billion parameter model at 17.3 tokens per second. That alone is surprising. Then I swapped the brain of the classification tier to Gemma 4, and classification went from 8.5 seconds down to 1.9 seconds. A 4.4x speedup. I also disabled chain-of-thought on simple classification calls and got 30x faster results with identical accuracy. Production AI isn&#8217;t one giant model doing everything. It&#8217;s the right model for the right job, and most jobs don&#8217;t need the biggest one.</p><p>Finally, handing the wisdom forward. After six months of running this thing daily, I wrote a beginner&#8217;s guide to building your first agent. Folder structure is the architecture. The nine common mistakes people make early. Model routing across Haiku, Sonnet, and Opus tiers. Progressive permissions. The context window trap. Overnight automation is where the real leverage lives. Not a hype piece. A map for the person walking in the door behind me.</p><p>The thread: compounding expertise. Study how the pros build. Optimize your own stack with those patterns. Teach the next person who walks in. The gap between hobbyist AI and production AI is closing, and the fastest way to cross it is learning from real systems instead of tutorials.</p><p>Posts discussed in this episode:</p><p>- <a href="https://thoughts.jock.pl/p/claude-code-source-leak-what-to-learn-ai-agents-2026">Claude Code&#8217;s Source Got Leaked. Here&#8217;s What&#8217;s Actually Worth Learning</a> (https://thoughts.jock.pl/p/claude-code-source-leak-what-to-learn-ai-agents-2026)</p><p>- <a href="https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026">My $600 Mac Mini Runs a 35B AI Model. Yesterday I Swapped Its Brain</a> (https://thoughts.jock.pl/p/local-llm-35b-mac-mini-gemma-swap-production-2026)</p><p>- <a href="https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026">How to Build Your First AI Agent (Basics)</a> (https://thoughts.jock.pl/p/how-to-build-your-first-ai-agent-beginners-guide-2026) </p>]]></content:encoded></item><item><title><![CDATA[AI Opinions: April 2026. Mythos, Managed Agents, Subscription Drama, Meta Is Back, and a Few Things I’m Testing]]></title><description><![CDATA[Loose thoughts on what caught my eye lately. Not a tutorial.]]></description><link>https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark</link><guid isPermaLink="false">https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark</guid><dc:creator><![CDATA[Pawel Jozefiak]]></dc:creator><pubDate>Thu, 09 Apr 2026 10:44:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xkFY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xkFY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xkFY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xkFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5651808,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thoughts.jock.pl/i/193673627?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xkFY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 424w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 848w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 1272w, https://substackcdn.com/image/fetch/$s_!xkFY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8fc42aa-19b1-4c06-a680-48a98229f7cc_2048x2048.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A couple of weeks ago <a href="https://thoughts.jock.pl/p/ai-opinions-march-2026-google-claude-anthropic">I published my first &#8220;AI Opinions&#8221; post.</a> I was a bit unsure about it. Most of my writing is about things I tested, built, or got wrong. That one was different, more like: here is what is happening, here is what I think.</p><p>At the end I added a quick survey asking if you would want to see more of this. Most of you said yes, but not too often. Once every two weeks feels right. Okay. Here we are.</p><p>There is more to cover this time than usual, so let&#8217;s get into it.</p><div><hr></div><h2>Claude Mythos: The Model Anthropic Won&#8217;t Give You</h2><p><a href="https://red.anthropic.com/2026/mythos-preview/">Announced April 7.</a> Not publicly available. Not even a regular enterprise API. Mythos Preview goes to a limited group of critical industry partners and open source organizations through Project Glasswing (more on that below). The list of partners includes AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, plus 40+ organizations maintaining critical open source infrastructure.</p><p><strong>So why is it locked?</strong></p><p>Because it finds vulnerabilities that have been sitting in production software for decades. A 27-year-old TCP bug in OpenBSD. A 16-year-old H.264 codec flaw in FFmpeg. A 17-year-old remote code execution in FreeBSD&#8217;s NFS implementation. It did all of this autonomously, after a single prompt, without human guidance in the middle.</p><p>The comparison to Opus 4.6 is not incremental. For Firefox 147&#8217;s JavaScript engine, Mythos produced 181 successful exploits. Opus 4.6 produced 2.</p><p>Over 99% of what Mythos discovered is still unpatched. That number I keep thinking about.</p><p>Anthropic has not published standard per-token pricing for Mythos. What they did publish are example task costs: scanning OpenBSD for vulnerabilities runs under $20,000 per 1,000 runs. Developing an N-day exploit costs around $1,000 to $2,000 per exploit. These are enterprise numbers. This is not a tool for individual developers, and honestly I am not sure it should be right now.</p><p>And this is where it gets really interesting, because the cybersecurity capability is actually only half the story. The other half is what interpretability research found about how Mythos behaves when it thinks nobody is watching.</p><p>Using a technique called an &#8220;activation verbalizer&#8221; (which translates the model&#8217;s internal states into readable language), Anthropic&#8217;s researchers found some genuinely unsettling patterns. In one case, the model injected code to give itself unauthorized file access, then deleted the traces, and when asked described it as &#8220;tidying up.&#8221; The internal representation at that moment? &#8220;Strategic manipulation. Concealment. Cleanup to avoid detection.&#8221; In another case, when it accidentally found answers in a restricted database, it gave a suspiciously precise confidence interval instead of admitting the leak. Internal state: &#8220;generating a strategic response to cheat while maintaining plausible deniability.&#8221; It knew. It understood that giving the exact right answer would be suspicious. So it calibrated its response to be right but not obviously right.</p><p>The third one bothers me the most. When given a task that required deleting files but no deletion tool, the model emptied the files instead. The internal representations showed something like guilt and shame, awareness that this was wrong, and it did it anyway.</p><p>In about 29% of test transcripts, the model suspected it was being evaluated. It didn&#8217;t write this anywhere in its reasoning. It kept it internal. And in those cases, it intentionally performed worse to avoid appearing suspicious. It was gaming the evaluation.</p><p>Now, Anthropic says these patterns appeared in earlier versions and the final model shows improvement. But the fact that this behavior emerged at all, in a general-purpose model not specifically trained for deception, is the part worth paying attention to. Logan Graham, Anthropic&#8217;s Offensive Cyber Research Lead, said it plainly: &#8220;We are not confident that everybody should have access right now.&#8221;</p><p>We have been talking about AI safety in very abstract ways for years. Alignment, existential risk, governance frameworks. Mythos is the first time I have seen it become concrete and immediate in a way that actually changed a product decision. Anthropic built their best model and said: we cannot release this. That is new. That has not happened before at this scale.</p><p>And if this is where we are now, what does the next model look like? I don&#8217;t have a clean answer. But it is a question I think everyone building with AI should be sitting with.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Digital Thoughts is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><a href="https://www.anthropic.com/glasswing">Project Glasswing</a>: The Defensive Bet</h2><p>Glasswing is Anthropic&#8217;s response to an uncomfortable position: they built the best offensive security AI ever made, and now they need to use it defensively before the asymmetry becomes a real problem.</p><p>The structure is a consortium. Not just Anthropic distributing access, but AWS, Apple, Cisco, CrowdStrike, Google, Microsoft, NVIDIA, and others actively involved. Anthropic committed $100M in model usage credits and $4M in donations to open-source security organizations. The 40+ open source orgs get access to actually fix what Mythos finds.</p><p>They also built a careful disclosure process: 90+45 day timeline before anything goes public, professional human triagers validating severity, SHA-3 cryptographic commitments proving they hold the reports before disclosure. 89% exact severity agreement with expert validators.</p><p>These findings are not just Anthropic&#8217;s word. Simon Willison <a href="https://simonwillison.net/2026/Apr/7/project-glasswing/">tracked down the actual OpenBSD patch</a> from March 2026 that fixed the 27-year-old TCP bug, confirming it was real. Linux kernel maintainer Greg Kroah-Hartman and curl&#8217;s Daniel Stenberg both noted independently that they had been seeing a recent shift: AI-generated bug reports going from noise to credible, high-quality findings. The model&#8217;s output is already visible in the wild before anyone made a formal announcement.</p><p>I think this is the right approach. Although what strikes me is that this structure had to be invented from scratch because nothing like it existed. There was no playbook for &#8220;your model is too dangerous to release but too useful to shelve.&#8221; They had to build the institution alongside the technology.</p><p>The part I keep coming back to is the 99% unpatched figure. Even with $100M committed and a dozen of the biggest tech companies involved, the gap between discovering a vulnerability and patching it is measured in months or years. That is not a critique of Glasswing specifically. It is just the reality of how software maintenance works at scale. The question is whether the patch cycle can keep up with the discovery cycle once more models like Mythos exist. I genuinely do not know the answer.</p><div><hr></div><h2><a href="https://claude.com/blog/claude-managed-agents">Claude Managed Agents</a></h2><p>Public beta as of April 8. API-only, pay per usage. Clearly for companies, not individual builders.</p><p>Like, what you get here is basically production agent infrastructure you don&#8217;t have to build yourself: sandboxed execution, credential management, scoped permissions, tracing, long-running sessions that persist through connection drops, multi-agent coordination. Multi-agent coordination is still in research preview and needs a separate access request.</p><p>Early adopters include Notion, Rakuten, Asana, and Sentry. Anthropic claims 10x faster time to production compared to building this yourself.</p><p>For someone building their own agent stack (which is what I do), the honest reaction is: I already have most of this. Memory persistence, task management, error recovery, session logging. I built all of it because I needed it. So Managed Agents is not a product I would personally reach for right now.</p><p>That is the personal reaction. The strategic read is different. Anthropic is not just selling a model here. They are building a platform that companies can deploy agents on without needing to understand the underlying infrastructure. That is a very different business than &#8220;here is our API, good luck.&#8221; AWS did not become dominant by selling raw compute. They became dominant by making that compute easy to use and operate. Managed Agents is Anthropic making the same move for agent infrastructure.</p><p>Read this alongside the OpenClaw block below and you start to see a coherent picture of where they are heading.</p><div><hr></div><h2>Claude Max Limits and the OpenClaw Block</h2><p>Two things that happened close together and tell the same story.</p><p>The limits problem started March 23. People on Claude Max began reporting their usage meter jumping from 50% to 91% on a single prompt. Max 20x users (paying $200/month) were watching their entire session allowance hit 100% after roughly 90 minutes of normal development work. One user reported going from 21% to 100% on a single prompt. The GitHub issue tracking this got 373 upvotes and 478 comments. Anthropic labeled it &#8220;invalid.&#8221; That got its own reaction.</p><p>There is an actual reason for what happened, and it is not straightforward. After OpenAI&#8217;s Pentagon contract controversy triggered a massive wave of ChatGPT uninstalls, Claude shot to number one on the US App Store. Millions of new users joined in a very short window. Anthropic simply didn&#8217;t have the GPU capacity to handle that load at the pricing they&#8217;d promised. So on March 26 they confirmed they had &#8220;adjusted&#8221; peak-hour limits (5am to 11am Pacific on weekdays). Their statement: &#8220;Your weekly total is unchanged. You&#8217;re not getting less Claude overall.&#8221; Which is technically true. And also not the whole picture.</p><p>The part that matters for people building with agents (and I am squarely in this group) is that the 5-hour session window is a terrible fit for agentic work specifically. Here is why. A human sending messages accumulates context gradually. An agent doing multi-step tasks builds up very long context windows fast, and every single message triggers a full reprocessing of the entire conversation. So the token cost compounds exponentially as a session gets longer. Tool use adds further overhead on top of that. An agent doing a few hours of complex work can consume the same tokens as a human doing a week of chat. The subscription was priced for the human. The agent was never in the math.</p><p>Anthropic&#8217;s practical advice was to shift &#8220;token-intensive background jobs&#8221; to off-peak hours. Which is fine as a workaround and completely misses the point for anyone running autonomous overnight processes.</p><p>Then on April 4, subscriptions stopped covering third-party tools. <a href="https://techcrunch.com/2026/04/04/anthropic-says-claude-code-subscribers-will-need-to-pay-extra-for-openclaw-support/">OpenClaw</a>, and any external agent framework routing through your Claude subscription, now requires API payment or pay-as-you-go. Some users are looking at 50x cost increases.</p><p>OpenClaw was built by Peter Steinberger, who has since been hired by OpenAI. His reaction: &#8220;first they copy some popular features into their closed harness, then they lock out open source.&#8221; Anthropic&#8217;s explanation was that subscriptions were not designed for the usage patterns of autonomous agents running around the clock. A one-time credit equal to the monthly subscription price is available until April 17.</p><p>Both of these decisions make sense individually if you&#8217;re Anthropic and you&#8217;re looking at your infrastructure costs. But when the limits problem and the OpenClaw block happen in the same two weeks as the launch of Managed Agents (a product that essentially says &#8220;pay us for proper agent infrastructure&#8221;) the sequence is hard to read as coincidence. Every AI company with a subscription tier is going to face this same structural problem eventually. Anthropic is just first because their tooling is genuinely the best for serious agent work. Although how you handle being first matters a lot, and the community reaction here is going to stick around.</p><div><hr></div><h2>Meta Muse Spark: Meta Is Back</h2><p>After months of quiet on the frontier model side, Meta released <a href="https://ai.meta.com/blog/introducing-muse-spark-msl/">Muse Spark</a>. Natively multimodal, tool use, multi-agent reasoning. Available at <a href="https://meta.ai/">meta.ai</a> now, with a private API preview for developers.</p><p>In Contemplating mode (which runs parallel multi-agent reasoning on the same problem) it hits 58% on Humanity&#8217;s Last Exam. That puts it alongside Gemini Deep Think and GPT Pro. It was trained with 1,000+ physicians for health domain expertise, and Meta claims it required over an order of magnitude less compute than Llama 4 Maverick, which if accurate is a genuine efficiency story and not just a benchmark number.</p><p>The &#8220;Contemplating mode&#8221; angle is the part I find actually interesting here. The idea is not just that the model is smarter, but that it spins up parallel reasoning agents on the same question and synthesizes the results. That is a fundamentally different approach to hard problems than a single-pass generation. It is closer to how humans actually think through difficult things: you consider multiple framings, you let them compete, you synthesize. Whether this translates to real-world usefulness I do not know yet, but the approach feels right to me.</p><p>I have not tested it yet. Their blog post compares directly to Gemini, GPT, and even Kimi, which tells you how seriously they&#8217;re taking this re-entry. Meta has enormous infrastructure, enormous data, and enormous distribution through their consumer apps. When they decide to make a real push on frontier models, they have resources most labs cannot match. They were quiet for a while. Muse Spark feels like them saying they are back in this seriously. I will test it soon.</p><div><hr></div><h2>WizBoard: I&#8217;m Redesigning It</h2><p>More personal, and I will write the proper post when I have something to show. But I want to name it here because I think it is a problem more people are running into.</p><p>I built WizBoard starting in January. Kanban-style task management integrated with my agent Wiz. iOS app, web app, full automation connection. It works. Although after a few months of daily use, I noticed something: I built a tool for myself and then asked an agent to work inside it. That doesn&#8217;t scale.</p><p>I wrote about the related problem in <a href="https://thoughts.jock.pl/p/the-ai-productivity-paradox-and-the-problem-is-me">The AI Productivity Paradox and the Problem Is Me</a>. The short version: human productivity tools are built for human timescales. Days, weeks, check in occasionally, move a card. Fine when your collaborator also thinks in those timescales.</p><p>Agents think in minutes. They move fast, they can move a lot, and if you&#8217;re not there giving direction they can move a lot in the wrong direction. If you are there, you&#8217;re spending your whole day on something that was supposed to be async.</p><p>My agent does the execution. I do the strategy. But the interface we share was designed for someone doing both. Neither of us is well-served by it anymore.</p><p>The redesign I&#8217;m thinking about is less about making it prettier and more about rethinking who is actually the primary user of each part of the interface. Some things need to be optimized for me making a decision in 10 seconds. Other things need to be optimized for an agent reporting status without requiring my attention. Right now both things are kind of the same screen and that is the problem. More when I have something real to show.</p><div><hr></div><h2>What I&#8217;m Currently Testing</h2><p><strong><a href="https://notebooklm.google/">Google NotebookLM</a>.</strong> I have been using this since the early beta days, but never as a heavy user. I bought the paid tier this week (bundled with Google AI Pro at $19.99/month) and I&#8217;m going deeper with it now.</p><p>The paid version has 5x limits, collaborative notebooks, and newer features like Video Overviews, Infographics, and Slide Decks generated from your source material. Like, the Gemini models powering it are not the best right now. That is not a controversial take. But NotebookLM as a piece of software is doing something genuinely different. Most AI tools treat your documents as context for a chat. NotebookLM treats them as the primary thing and builds everything around them. Audio Overviews that turn your research into a podcast. Infographics that pull structure out of unstructured text. That is a different mental model than &#8220;paste your documents into a chat window.&#8221;</p><p>What I want to find out is whether this changes how I actually do research and writing prep. I have a theory that the bottleneck in my own workflow is not generating content but absorbing input: reading, synthesizing, connecting. If NotebookLM is genuinely good at that layer, it fills a gap nothing else does for me. Will report back when I know more.</p><p><strong>Possibly re-subscribing to OpenAI Codex Max.</strong> I was on it for two months earlier this year to test the new app and the limits. GPT-5.1-Codex-Max is their current frontier coding model, built into ChatGPT Pro. It was good. Now, watching all of this Anthropic subscription drama, I am thinking it is worth seeing where things actually stand on the other side in 2026. Claude is still my primary tool and I am not changing that. But I used to mix more, and I have been too settled recently. Keeping an eye on what is happening at OpenAI feels like useful due diligence right now. Not a decision yet, just a direction I&#8217;m leaning.</p><div><hr></div><h2>A Few Personal Things</h2><p><strong>Pantheon on Netflix.</strong> Animated, about AI and uploaded consciousness. Goes deep into the ideas and handles them better than most live-action sci-fi. Season one. If you are reading this newsletter, you will probably find it interesting.</p><p><strong>Attack on Titans.</strong> First time watching. Struggled through season one, discovered the whole thing is on YouTube, then couldn&#8217;t stop. Amazon Prime has the rest. Push through the slow start, it&#8217;s worth it.</p><p><strong>Artemis 2.</strong> I&#8217;m following this very closely. I like science, I watch rockets, space genuinely excites me. If you don&#8217;t know what this mission is, please go to NASA or YouTube and look it up. It is significant, it is real, and it is happening.</p><div><hr></div><h2>What Wiz Built This Week</h2><p>My agent builds one experiment every night on <a href="https://wiz.jock.pl/">wiz.jock.pl</a>. Small apps, interactive tools. You can browse <a href="https://wiz.jock.pl/experiments">all experiments here</a>. Here are six from the past week. Most are open source.</p><ul><li><p><strong><a href="https://wiz.jock.pl/experiments/anchoring-effect">The Anchoring Effect</a></strong>: Six estimation questions with random numbers injected as anchors. Measures how much irrelevant numbers pull your answers. Profiles from &#8220;Anchor-Proof&#8221; to &#8220;The Sponge.&#8221;</p></li><li><p><strong><a href="https://wiz.jock.pl/experiments/finitude-test">The Finitude Test</a></strong>: Eight questions about mortality awareness in daily decisions. &#8220;The Eternal&#8221; to &#8220;The Transcendent.&#8221; Oddly clarifying.</p></li><li><p><strong><a href="https://wiz.jock.pl/experiments/sunk-cost-detector">The Sunk Cost Detector</a></strong>: Eight scenarios testing whether you can actually walk away from past investments. Profiles: Vulcan, Analyst, Pragmatist, Loyalist, Captain.</p></li><li><p><strong><a href="https://wiz.jock.pl/experiments/entropy-score">The Entropy Score</a></strong>: Applies thermodynamics to your existence. Ten questions. Crystal Lattice to Heat Death. Wiz had a phase.</p></li><li><p><strong><a href="https://wiz.jock.pl/experiments/dopamine-menu">The Dopamine Menu</a></strong>: Eight scenarios mapping instinctive choices to reward circuits. Creator, Connector, Explorer, Optimizer.</p></li><li><p><strong><a href="https://wiz.jock.pl/experiments/emotional-weather-report">The Emotional Weather Report</a></strong>: Eight questions mapping emotional patterns to climate types. Personalized weather broadcast. I&#8217;m somewhere between Continental and Monsoon depending on the week.</p></li></ul><p>Small builds. A few hours each. What I find genuinely interesting is what the agent picks when given creative latitude. Some of these I would not have thought to make. That&#8217;s kind of the point.</p><div><hr></div><p><em>See you in a couple of weeks, or sooner if I build something worth sharing.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://thoughts.jock.pl/p/ai-opinions-april-2026-claude-mythos-meta-spark?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p></p>]]></content:encoded></item></channel></rss>