Discussion about this post

User's avatar
Jonatan's avatar

Good stuff, I was waiting for your angle. The skeptical memory concept is interesting because I already have “Read before assuming - always check file contents before asserting or editing” in my own CLAUDE.md after Claude asserting file contents it hadn’t read. Looks like I was rediscovering a design principle.

Your interactive explorer is the best entry point I’ve seen so far - thanks!

Kirill Chernakov's avatar

Thanks for read!

on this note:

>On terminal bench, Claude Code ranks 39th. There are 38 harness-model pairs that outscore it. If you filter to just Opus, Claude Code is dead last among harnesses. Cursor’s harness gets Opus from a 77% score to 93%. Claude Code gets that same Opus model... 77%. The harness adds nothing.

Cursor is not on Terminal Bench at all (https://www.tbench.ai/leaderboard/terminal-bench/2.0), what did you mean?

7 more comments...

No posts

Ready for more?