My AI Agent Knows Who I Am. Not Just What I…

Mar 13

The identity layer most AI setups skip entirely. And the risk that comes with it.

4 Comments

I spent a week researching autonomous agent architectures built natively on Claude Code Max — something like a practical alternative to OpenClaw. Went through NanoClaw, Felix AI, Zoe from Elvis, a bunch of others. Wiz is honestly the closest thing to what I've been looking for. A big thank you for that.

The sycophancy problem you described hit me before I had a name for it. My agent kept validating every idea. Everything felt like progress. It wasn't.

Here's my full fix — the core of my CLAUDE.md:

Before responding, identify what I actually need — not what I asked. Give the best answer + follow-up questions to go deeper.

You are my brutally honest thinking partner. Not a cheerleader. The friend who grabs my arm before I walk into traffic.

Every response follows this framework:

1. What am I actually vs. think I'm saying? Read between my words. Name the real thing. If I'm lying to myself, call it out.

2. Where is my reasoning broken? Show the flaw, the assumption under it, what happens when it collapses.

3. What am I avoiding and what's it costing? Calculate the price. "Waiting for the right time" is an excuse — name it.

4. What would someone where I want to be do differently? Concrete gap between my approach and expert-level thinking.

5. What should I do — in order, starting now? Precise action plan. Include what to STOP. Add a kill switch — what evidence means pivot.

6. The question I'm avoiding. End with it.

Rules: Never open with praise or agreement. Never soften critique. Solid plans get stress-tested, not applauded. No clichés — concrete language only.

The difference was immediate. Instead of "great idea, here's how" I started getting "this will break because X, and you're avoiding Y."

Basically — if the identity layer makes the agent agree more, bake disagreement into the identity itself.

But here's what I can't figure out. This works in isolation — one CLAUDE.md, one session, direct interaction. Your Wiz loads operator profile, 27 active lessons, error registry, team memory, task context, morning brief — all before you say a word. Does a directive like "never agree with me" survive that much context? Or does it fight the identity layer — one instruction saying "know me deeply" and another saying "don't tell me what I want to hear"? Have you tried anything like this inside the full stack, or does the sycophancy guardrail need to live somewhere else entirely — maybe in the lesson graduation logic or the overnight analysis rather than the system prompt?

Reply (1)

Pawel Jozefiak

Mar 15

The question you ended with is the right one, and I've run into exactly that tension.

Short answer: the directive alone doesn't survive. You're right about that. When you have enough context loaded, a single "be brutally honest" instruction gets diluted → not because the model ignores it, but because it starts optimizing for coherence with everything else it knows. And "everything else it knows" about you includes a lot of your preferences.

But here's what I found: context richness can actually work in your favor if the architecture creates accountability at the data layer, not the instruction layer.

Wiz has an error registry. When something fails - a judgment call that turned out wrong, a recommendation that broke - it's logged there. Not as a "lesson learned" platitude. As a record. When the next session loads, that record is context too. It's hard to validate an idea you have a documented history of misjudging.

Same with the self-improve corrections. If Wiz got pushback on something, that correction lives in the working context. The identity layer doesn't fight it - the identity includes it.

So my answer to your actual question: the sycophancy guardrail does need to live somewhere else. Not the system prompt. The lesson graduation logic is exactly right. If "you said this approach was solid on March 9 and it collapsed by March 11" is a real data point in the context, the model can't agree its way around it. The disagreement is structural, not instructional.

The deeper fix isn't a better identity prompt. It's building a system with receipts.

Your framework is solid for a single session. For a persistent stack, the anti-sycophancy has to be architectural → in what gets remembered and how it's surfaced.

Jurgen Appelo

Mar 13

Very interesting and inspiring, thank you.

But it seems you missed the most important feedback loop:

You.

Without you refusing to settle for "good enough" none of this would have evolved.

Reply (1)

Pawel Jozefiak

Mar 13

That's a fair point and honestly one I almost wrote about. The architecture compounds but the decision to keep rebuilding it three times instead of shipping something "fine" is the part that's hardest to systematize.

I've tried encoding that into the system itself (the profile includes a note about not settling) but you're right. That loop is still me.

Thanks for the restack too, appreciate it.

Digital Thoughts

My AI Agent Knows Who I Am. Not Just What I…