When Your AI Agent Starts Fixing Itself: A Week of Rebuilding Wiz

From good to zero and back to something again

Jan 30, 2026

So I just spent a week tearing apart my personal AI agent and putting it back together. Not because it was broken - because I wanted to see how far I could push it.

The result: Wiz now extends its own capabilities when it finds a gap. It logs its own errors and fixes them systematically. It runs multiple tasks in parallel instead of one at a time. And it went from a custom folder-based architecture to Claude Code’s native system.

This is a building-in-public post. Real code, real file paths, real config. If you’ve been curious what it actually looks like to iterate on an AI agent - the specific changes, the decisions behind them, and what broke along the way - here’s the deep dive.

The Starting Point: What Was Broken

I wrote about building Wiz originally (https://thoughts.jock.pl/p/wiz-personal-ai-agent-memory-claude-code-2026). That version worked. It had memory, sub-agents, scheduled tasks. But after weeks of daily use, friction points became obvious:

The old skill system lived in custom folders like /wiz/skills/email/SKILL.md. Every session, I had to explicitly tell Wiz “load the email skill” or it wouldn’t know the capability existed. Claude Code has a native ~/.claude/skills/ directory with YAML frontmatter that auto-loads based on context - but I wasn’t using it.

Memory had become a monster. I’d built: memory.md (short-term, ~50 lines), memory-long/topics/.md (10+ topic files), memory-long/sessions/.md (daily archives), memory-long/index.md (keyword mapping), and semantic search with local embeddings. Impressive engineering. Absolute overkill.

Tasks ran sequentially. Research across three job boards meant: hit LinkedIn... wait 30 seconds... hit Indeed... wait 30 seconds... hit Greenhouse... wait 30 seconds. Claude Code’s Task tool can spawn parallel sub-agents, but my architecture didn’t use them.

I was maintaining infrastructure instead of using the tool. Time to rebuild from scratch.

The Architecture Migration: Custom to Native

Claude Code’s native system uses two directories: ~/.claude/skills/ for capabilities and ~/.claude/agents/ for sub-agents. The key is YAML frontmatter - a header block that tells the system when to auto-load.

Here’s what a skill file actually looks like (from ~/.claude/skills/typefully/SKILL.md):

---
name: typefully
description: >
  Create, schedule, and manage social media posts via Typefully.
  ALWAYS use this skill when asked to draft, schedule, post, or
  check tweets, posts, threads, or social media content.
last-updated: 2026-01-29
allowed-tools: Bash(./scripts/typefully.js:*)
---

The description field is the magic. When I mention “post a tweet” or “schedule something for LinkedIn,” Claude Code pattern-matches against skill descriptions and auto-loads the relevant file. No manual context loading.

I migrated 13 skills this way: email-automation, deployment, image-generation, notion-actions, firecrawl, web-research, content-creation, peekaboo (macOS automation), unified-recall, typefully, x-direct, shopify, and artur-jobs. Each has specific tool permissions - Bash, Read, Write, or MCP connections.

The practical difference: old architecture required explicit loading every session. New architecture just works - mention the capability, skill loads automatically, context is available.

Self-Extension: Teaching the Agent to Grow Itself

This is where it gets interesting. I didn’t just want to migrate existing skills - I wanted Wiz to create new ones when needed.

The implementation lives in ~/.claude/CLAUDE.md - Wiz’s identity file. Here’s the actual instruction block:

## Self-Extension (Automatic Skill Creation)

I extend my own capabilities by creating new skills. This is core to evolution.

### When to Create a Skill AUTOMATICALLY

Create a new skill when ALL of these are true:
1. **Capability gap** - User requests something no existing skill covers
2. **Reusable** - Pattern will likely be useful again (not one-off)
3. **Structured** - Involves clear process, API, or tool usage
4. **User benefit** - Saves time on future similar requests

### How to Create a Skill

1. **Recognize the gap** - "No skill exists for this, and it's reusable"
2. **Create the skill file** immediately:
   ~/.claude/skills/[skill-name]/SKILL.md
3. **Use minimal YAML frontmatter**:
   ---
   name: skill-name
   description: Clear description of when to use this skill
   allowed-tools: Tool1, Tool2
   ---
4. **Use it immediately** for the current task
5. **Inform user**: "Created new skill: [name] - [what it does]"

Real example: I asked Wiz to post to X/Twitter. No skill existed. Instead of saying “I can’t do that,” Wiz:

Recognized the gap (social posting capability needed)
Researched options (direct API vs Typefully for anti-bot bypass)
Created ~/.claude/skills/typefully/SKILL.md with full API documentation
Used it immediately to post the tweet
Told me: “Created social-posting skill for future use”

That skill file now has 450+ lines of documentation - API endpoints, error handling, multi-platform support. The agent wrote it, based on understanding what was needed. First tweet from @wiz_jock shipped same session.

The key insight: this isn’t magic. It’s Claude Code + clear instructions + file system access. But the behavior is qualitatively different. The agent develops new abilities when it needs them instead of hitting walls.

Self-Fixing: The Error Registry System

This came from frustration. Wiz would hit errors - missing file, API timeout, permission issue - and work around them. But the underlying problem stayed. Same error, different day.

The solution: an error registry at /wiz/automation/self-fix/error-registry.json. Here’s a real entry from this week:

{
  "id": "err-20260130-001",
  "severity": "high",
  "category": "data_loss",
  "title": "Social Commenter data replacement instead of merge",
  "description": "When refreshing social commenter data, replaced all existing entries instead of merging new with old. This broke localStorage state tracking since URLs no longer matched.",
  "file": "projects/digital-thoughts/substack-commenter.html",
  "solution": "ALWAYS merge new data with existing: newData.d = [...newEntries, ...oldData.d]. Never replace the entire array.",
  "prevention": "Added warning comments in HTML file, updated memory.md with MERGE rule",
  "logged_at": "2026-01-30T01:30:00Z",
  "status": "resolved"
}

The instructions in CLAUDE.md tell Wiz when to log:

Missing file/module that should exist
API/service failures that could be config issues
Anything requiring a workaround to complete the task

Every morning at 7:15 AM, a scheduled “self-improve” task runs. It reads the registry, picks the highest-severity unfixed errors, and attempts resolution. The loop: error occurs -> gets logged with context -> self-improve finds it -> fixes it -> marks resolved.

I’ve watched Wiz fix its own broken scripts this way. Not debugging in the moment - scheduled maintenance that resolves issues without my involvement. The error above? Wiz logged it at 1:30 AM, fixed it by adding merge safeguards to the HTML file, and updated memory.md with the rule.

Parallel Execution: Sub-Agents That Actually Work

I wrote about MCP tools in previous posts (https://thoughts.jock.pl/p/10-practical-mcp-examples-claude-ai-model-context-protocol). But sub-agents are different - they’re Claude Code’s way of spawning parallel workers.

The problem was concrete: researching across multiple job boards. Sequential execution meant each source blocked the next. Three boards at 30 seconds each = 90+ seconds total.

The solution: sub-agents in ~/.claude/agents/. Each is a markdown file with YAML frontmatter specifying: name, description, allowed tools, and model. The key detail: model: haiku. Sub-agents don’t need full Claude Sonnet - they’re specialized workers. Haiku is faster and cheaper.

The firecrawl-scraper agent, for example, has Bash and Read tools, uses Haiku model, and contains Firecrawl API documentation with curl examples. When spawning three scrapers simultaneously, they all run in parallel. Results aggregated when complete. Total time = max single task (~30s), not sum (~90s).

I created 6 sub-agents total: firecrawl-scraper (parallel web scraping), web-researcher (research with summarization), email-sender (Gmail SMTP), content-writer (blog drafts), and two project-specific ones. The master agent (Sonnet) coordinates; specialists execute.

Memory Simplification: Deleting What Didn’t Work

This is the part where I admit I over-engineered. If you’ve read my posts on building AI tools (https://thoughts.jock.pl/p/claude-code-review-real-testing-vs-zapier-make-2026), you know I tend to build complex systems.

The old memory system had: memory.md (50+ lines), memory-long/topics/.md (10 topic files), memory-long/sessions/.md (daily archives), memory-long/index.md (keyword mapping), and semantic search with local embeddings using BM25.

The rebuild forced the question: what do I actually use?

Answer: almost none of it. What I needed was simpler:

memory.md (~30 lines max) - Preferences, current focus, last 2-3 days. Loads every session.
Notion as source of truth - Tasks, projects, long-term knowledge already lived there. Why duplicate?
Skills contain domain knowledge - Each skill file has the context it needs. Image generation skill knows OpenRouter API. Deployment skill knows server credentials.

I deleted the semantic search, the embedding generation, the topic files. The system is simpler and actually works better - less to maintain, less to go wrong, faster context loading.

Current memory.md is 50 lines. Contains: preferences (HTML emails always, tasks go to Notion, due dates on everything), current focus (what I’m working on this week), and recent context (last 2-3 days of significant work). That’s it. Claude Code handles session transcripts automatically.

Always-On: Discord, Email, and the NoSleep Daemon

The rebuild also addressed availability. I wanted to reach Wiz from my phone - anywhere, without opening a terminal.

The solution: 11 services running via macOS launchd. Core infrastructure includes:

Discord bot (Wiz#8371) - Real-time DMs and @mentions. I message Wiz on Discord, it responds with full Claude processing.
Email watcher - IMAP IDLE connection to Gmail. Messages labeled “Agent” get processed automatically.
NoSleep daemon - Prevents Mac from sleeping while services run. Uses caffeinate under the hood.
Screen handover - DMs me on Discord when I step away with Claude open, preserving context.

Plus scheduled tasks: job finder (6 AM), daily planner (6:30 AM), self-improve (7:15 AM), substack research (8:15 AM), blog pipeline (9 AM Mon/Wed), social manager (10 AM), and hourly wake checks.

This is the always-available assistant I actually wanted. I wrote about Claude Cowork (https://thoughts.jock.pl/p/claude-cowork-review-anthropic-ai-agent-desktop-2026) and how it changes the “coworker” relationship. But Cowork is desktop-bound. With Discord and email integration, Wiz is available on my phone, from anywhere.

The Social Automation Deep Dive

I went deep on this one. Cross-platform engagement is time-consuming but valuable for building audience. The question: how much can be automated without losing authenticity?

The tool lives at projects/digital-thoughts/substack-commenter.html. It’s a local HTML file that shows:

Pending comments organized by platform (Substack, LinkedIn, X)
One-click copy buttons for each comment
LocalStorage tracking of what’s been posted (with read/posted timestamps)
Cross-promo URLs to my content embedded in relevant responses

The workflow: Wiz researches posts in my niche (AI, automation, e-commerce), drafts responses in my voice, and queues them. I review, edit if needed, then post. Currently 133 prepared comments across platforms.

The hard lesson this week: data loss. Wiz refreshed the comment data and replaced instead of merging. 133 comments lost, localStorage state broken. Hence the error registry entry above. The fix: explicit merge logic and warning comments in the HTML file. Never replace the entire array - prepend new entries to existing.

Is this gaming the system? The comments are real thoughts - genuine responses I’d write if I had infinite time. The automation handles research and drafting; I handle the actual interaction.

What I Actually Learned (The Technical Takeaways)

A week of rebuilding crystallized several lessons:

Native architecture beats custom systems. I spent months building custom skill folders, memory structures, and context loading. Claude Code already has ~/.claude/skills/ with YAML frontmatter auto-loading. Fighting the framework costs more than learning it.

Self-modification requires explicit instructions. The agent won’t create skills or log errors unless you tell it to. But once you do, it’s categorically different - capability gaps get filled as they appear. The instructions in CLAUDE.md are ~100 lines total for both self-extension and self-fixing.

Parallel execution should be the default, not the exception. Once you have sub-agents running simultaneously, sequential feels broken. The Task tool with multiple agents is transformative for anything involving multiple sources. Model selection matters - Haiku for workers, Sonnet for coordination.

Simplicity beats complexity. My elaborate memory system wasn’t better than a 30-line file and Notion. The rebuild forced me to keep only what I actually used. Semantic search sounded impressive; I never actually queried it.

Always-on changes the relationship. Discord DMs and email monitoring mean Wiz is available like a coworker, not a tool. I wrote about this shift in the Cowork post - but building it yourself makes the feeling concrete.

The Numbers

13 skills with YAML frontmatter auto-loading
6 sub-agents (3 user-level, 3 project-level)
11 services running via launchd
~30 lines of memory instead of 500+
3x faster multi-source research (parallel vs sequential)
1 self-created skill this week (Typefully for X posting)
1 self-fixed error (social commenter data merge)

What’s Next

The rebuild is done. The foundation works. Now I’m iterating on edges:

Better morning reports - smarter prioritization of what to surface
Improved cross-promo targeting - finding the right conversations, not just any conversations
More reliable error detection - catching issues before they cascade

But honestly? The system works. It runs while I sleep. It extends when it needs to. It fixes what breaks. It’s available on Discord from anywhere.

The goal was an agent I could trust to operate independently. I’m closer than I expected.

If you want to build something similar, start with Claude Code’s native architecture. Don’t fight the framework. Use ~/.claude/skills/ with YAML frontmatter. Use ~/.claude/agents/ for parallel workers. Keep memory simple. Add self-extension and self-fixing instructions to your identity file. The pieces are all there - you just have to connect them.

PS. How do you rate today’s email? Leave a comment or a like if you liked the article - I always value your comments and insights, and it also gives me a better position in the Substack network.

ToxSec

Mar 30

Great stuff on this one Pawel. And i love the art for the hero image too. Hopefully we see more adoption of this type of pattern.

KeSi

Mar 28

Absolutley great insights!

I have a question regarding the memory simplification. I bought the stack and trying to understand where the simplification took place. Now you are using memory , memory-weekly and memory-index in the system. Is this the simplification from a "bigger memory system"?

2 replies by Pawel Jozefiak and others

7 more comments...

Digital Thoughts

Discussion about this post

Ready for more?