When Coding Tools Compete: Claude Code vs. Codex (Real Usage After 2 Months)
Claude Opus 4.6 vs GPT 5.3 Codex
I’ve been an Anthropic loyalist for months. Claude Opus, Sonnet, Haiku — everything I build runs on that backbone. My AI agent Wiz lives on Claude Code. The night shifts that ship features while I sleep? Claude. The multi-agent teams that built a roguelike in 45 minutes? Opus 4.6.
But when OpenAI dropped GPT-5.3-Codex on February 5, 2026 - their most capable coding model yet, powered by a custom chip and designed for agentic work - I had to see if the hype was real.
So I ran both. For two weeks. On the same codebase. Here’s what I actually learned.
The Test
I didn’t want toy examples. I gave both tools the same task: audit and improve my AI agent’s entire codebase.
Context: Wiz is a 2-month-old project. Dozens of Python scripts, automation workflows, API integrations, a night shift system, skills, memory management, error logging. It works. But like any project built fast, there’s legacy code, abandoned features, orphaned processes, and things I built for “five minutes” that stopped working.
I told both: “Review everything. Find bugs, improvements, stale code, anything broken or inefficient.”
Then I watched.
What Codex Does Better
1. It Reads the Whole Codebase
This was the most obvious difference.
When I asked Claude Code (Opus 4.6) to review a function or fix a feature, it focused on that specific file. Laser-targeted. Efficient. But isolated.
Codex? It read everything. Not skimming - actually reading. Before making any changes, it mapped the entire structure. It understood how files connected. When I asked it to fix something, it immediately flagged: “Hey, this is also used in three other places. If we change it here, those break. I’ll update all of them.”
Claude Code doesn’t do that unless you explicitly point it out. It’s optimized for speed and efficiency, which usually works. But when you’re refactoring or improving an interconnected system, you need the broad view. Codex has it by default.
Example: I asked both to improve error handling in my Discord automation. Claude Code fixed the function I pointed to. Codex fixed that function and the four other scripts that called it, updated the error logging format to be consistent across the codebase, and flagged a deprecated library I was still importing in two places.
2. It’s Faster
Not just a little faster. Noticeably faster.
I’m comparing standard Codex (not the Spark real-time version) to standard Opus 4.6 (not fast mode). Codex consistently delivered results quicker - especially for multi-file refactoring or deep analysis tasks.
Part of this is OpenAI’s custom chip, designed specifically for coding workloads. But even beyond raw speed, Codex feels faster because it doesn’t need multiple passes. It gets context on the first read and moves.
3. The UI/UX Is Next-Level
I like working in the terminal. Claude Code’s CLI is clean, functional, and I’m used to it.
But Codex’s interface - the dedicated app is genuinely better. Better syntax highlighting, cleaner diffs, easier navigation through multi-file changes, smarter auto-complete when you’re mid-command. CLI is mid IMHO.
It’s not just polish. It’s thoughtfully designed for people who code in AI-assisted environments all day. Claude Code works. Codex flows.
If the rumor is true that Peter (OpenClaw creator) joining OpenAI today means they’re doubling down on agentic tooling, this could get even better. OpenClaw pioneered a lot of the patterns Claude Code now uses. If they integrate that thinking into Codex, the gap widens.
What Claude Code Does Better
1. Agent Orchestration
Here’s where Anthropic still dominates.
Claude Code isn’t just a coding assistant. It’s the foundation for my entire agent system. Skills, sub-agents, memory persistence, autonomous execution, orchestration across multiple models - all of that runs on Claude Code architecture.
When I want my agent to:
Spin up a team of specialists to build two apps simultaneously
Run night shifts from 10 PM to 5 AM with planning, execution, and wrapup phases
Coordinate long-running tasks that involve research, tool use, and deployment
Claude Code just works. The agent teams feature in Opus 4.6 is production-ready. I’ve deployed real projects with it. Zero babysitting required.
Codex can write the code for an agent system. Claude Code can run one.
2. Sustained Execution Quality
When I’m improving code, Codex wins. Higher quality output, fewer bugs, better understanding of the full context.
But when I’m building and deploying autonomously, Claude Code is more reliable.
Example: My night shift automation. I’ve run dozens of overnight sessions where Wiz (my agent):
Plans what to build
Writes code
Deploys to production
Sends me a summary email at 5 AM
That workflow - plan, execute, deploy, report - requires sustained multi-step reasoning over hours without human input. Claude Code handles it. Codex would probably do well on steps 1-3, but I haven’t tested it in that autonomous, long-duration context yet.
3. It Knows My System
This is less about the model and more about integration.
Claude Code has access to my CLAUDE.md files, my skills directory, my memory system, my project structure. It’s built for persistent agents that remember context across sessions.
Codex is phenomenal at understanding a codebase you point it at. But it doesn’t have the same native integration with the agent workflows I’ve built. That’s solvable — I could port my setup to Codex. But right now, Claude Code is home base.
The Plot Twist
Today (February 15, 2026), Peter announced he’s joining OpenAI.
Peter built OpenClaw - the open-source agent framework that inspired a huge chunk of what I built with Wiz. His work on autonomous execution, tool use, and agentic loops directly influenced Claude Code’s design.
Now he’s at OpenAI. Working on... what? Nobody’s saying yet. But the obvious guess: making Codex more agent-ready.
If that happens — if Codex gets the kind of agent orchestration primitives that Claude Code has — this comparison shifts.
And OpenClaw? Staying open source. That matters. A lot of what I learned came from reading his code. If he’s shipping agent tools at OpenAI while keeping OpenClaw alive, that’s a huge win for the builder community.
The Honest Take: When to Use Each
I’m not switching. But I’m not loyal to the point of ignoring what works.
Use Claude Code (Opus 4.6) when:
You’re running autonomous agents over long timeframes
You need agent teams working in parallel
You want orchestration, not just coding
Your agent needs persistent memory and skills
You’re deploying to production without supervision
Use Codex (GPT-5.3-Codex) when:
You’re refactoring or improving existing code
You need deep codebase understanding across many files
Speed matters (especially for iteration loops)
You want the best terminal UX available
You’re doing pure coding work, not agent orchestration
The hilarious part: Codex is better at improving agents. Claude Code is better at running them.
So my workflow now looks like this:
Build and run agent systems on Claude Code
When the codebase needs serious refactoring, bring in Codex for deep review
Deploy and operate on Claude Code
It’s not one or the other. It’s both, for different jobs.
What I’m Considering Next
This experiment got me thinking about open-sourcing Wiz.
Peter’s work with OpenClaw showed me what’s possible when agent systems are open. I’ve built something I think is good - not as polished as OpenClaw, but different. More Anthropic-native. Built around CLAUDE.md, skills, memory tiers, nightshift automation, autonomous deployment.
If there’s interest, I’ll do it. Full repo. Documentation. Templates. The whole agent blueprint.
Not sure yet. But after two weeks of comparing tools and seeing how fast this space moves, I’m leaning toward: share the work, help others build, see what comes back.
If you have thoughts on that - or if you’d actually use an open-source Anthropic-based agent framework - leave a comment or DM me. I’m genuinely curious.
Try Both
If you’re building with AI coding tools:
Claude Code: claude.ai/code or the CLI
Codex: Codex app or ChatGPT with Codex enabled
And if you want to see what a production AI agent looks like after 2 months of night shifts and autonomous work: wiz.jock.pl
The tools are good. Both of them. Pick the one that fits your workflow. Or use both, like I do.
The future isn’t one model. It’s knowing which tool to reach for.
Building with Claude Code? I compiled 50+ prompts and 8 CLAUDE.md templates from 2 months of daily use into the Claude Code Prompt Pack. $29 — every prompt battle-tested in production.
I’m Pawel. I build things with AI and write about what actually works. More at thoughts.jock.pl.
If you found this useful — I pack everything I build into a paid subscription. $9.99/month gets you access to 6 products worth $165, including the Claude Code Prompt Pack. No fluff — just working systems.
PS. How do you rate today’s email? Leave a comment or a heart if you liked the article — I always value your comments and insights, and it also gives me a better position in the Substack network.




Oooo, interesting development with Peter joining OpenAI.
Oh, I'd love to know more about Wiz! How do you run it, where, etc.?
Amazing breakdown in this article though!