Dec 19, 2025

What happens when Claude runs a real business... and why I’m doing the same thing

1 Comment

Comment removed

Comment removed

This is spot on and honestly validates so much of what I'm seeing in real implementation.

The procedural guardrails thing is huge and you're right that it's massively underrated compared to prompt engineering. Everyone wants the magic prompt when what actually works is forcing the AI to go through the boring checklist. "Did you check three suppliers? Did you document the decision? Did you calculate the actual margin?" That's what prevents the catastrophic failures.

Your supply chain example is perfect because it's the same pattern. The generalist agent has to juggle too many variables and ends up making these bizarre tradeoffs that seem logical in isolation but make no sense for the actual business. When you split it up... procurement only thinks about sourcing and availability, pricing only thinks about margins and competition, fulfillment only thinks about logistics... suddenly each agent has a clear objective function and stops making weird compromises.

And yeah the cost architecture at scale is something I'm obsessing over right now. When you're testing with 10 transactions you don't notice that each one is burning through 50 API calls. When you're at 1000 transactions suddenly you've spent more on AI costs than you made in profit. That math breaks fast if you're not designing for it from day one.

What kind of margin improvements did you see when you split to specialized agents? Curious if you tracked that or if it was more about reducing errors than direct cost savings.

Reply

Share

Digital Thoughts

Anthropic Tried Running a Vending Machine…