9 Comments
User's avatar
Kirill Chernakov's avatar

Thanks for read!

on this note:

>On terminal bench, Claude Code ranks 39th. There are 38 harness-model pairs that outscore it. If you filter to just Opus, Claude Code is dead last among harnesses. Cursor’s harness gets Opus from a 77% score to 93%. Claude Code gets that same Opus model... 77%. The harness adds nothing.

Cursor is not on Terminal Bench at all (https://www.tbench.ai/leaderboard/terminal-bench/2.0), what did you mean?

Hari Krishna's avatar

Nice read!

Juan Gonzalez's avatar

Thanks for putting this together. I’ve already go some pieces from other people’s reviews.

Took a couple of ideas to improve my modular agents system (and also learned a few things I didn’t thought were possible before!)

Pawel Jozefiak's avatar

Sure thing! I bet there’s more than that! I just described things that I wa interested :D

Juan Gonzalez's avatar

I think enough to keep someone busy for months lol 🤓😂

Jonatan's avatar

Good stuff, I was waiting for your angle. The skeptical memory concept is interesting because I already have “Read before assuming - always check file contents before asserting or editing” in my own CLAUDE.md after Claude asserting file contents it hadn’t read. Looks like I was rediscovering a design principle.

Your interactive explorer is the best entry point I’ve seen so far - thanks!

Pawel Jozefiak's avatar

Thanks! I needed some time to really explore the whole leak. Many people jumped into this train like the same day it was leaked. I was like - how was that possible to really explore this? I have to admit that after 4h on this, I still think there’s probbaly more.

Anyways - it’s fun and Claude Code should be OSS!

Jonatan's avatar

Four hours in and still finding things is the honest version of this story. I hope the community response will make the OSS case for Anthropic. Though the DMCA campaign suggests they see it differently, at least while the IPO clock is running.