I also tend to want to build a truly self-built agentic setup instead of using Claude Cowork and would want to buy your starter kit. But I have the following scenario and I suspect that many people have a similar scenario. That is why it might even be worth its own article.
Namely, I have a Windows PC laptop, an Android phone. I have a large screen, a mouse, but no separate keyboard. I would now order a Mac Mini and buy your starter kit and build an agentic system.
And what I actually want to achieve is the following. I want to go for a walk in the fresh air and speak into my phone to capture my thoughts while I am walking. In my thoughts I move from one topic to another, yet it is somehow connected, but what I would need help with is sorting and pre-structuring it. Some of my thoughts are semi-philosophical thoughts or schools of thought that I need to explain so that people understand what a successful transformation in a company is. Because it has a lot to do with mindset and assumptions about people and with questioning things as they are. I need such points in articles. Then I have lots of points that are actually more like instructions. That would then need to be cleanly sorted by the AI agents. What is a thoughtful article? What is a practical guide? And then I would like the AI agents to build digital products from the guides.
And I already have a very precisely defined design set that I developed while vibe-coding my website. There are certain colors, lines, fonts, etc. that I want exactly that way and no other way. And that has to be stored centrally somewhere. I have now managed with Claude to get it to remember the design to some extent. Still, I have to repeat and correct it again and again. That is very inefficient. So when it comes to design, I need absolute reliability. And I also have very precise requirements for how the digital products should look like.
When it comes to social media activity, I actually still like writing my own things, but at least there could be some pre-sorting of which topics make sense in which order and how. And I also need an agent that researches for me which people on LinkedIn and Substack deal with topics similar to mine, selects them for me so that I can get into communication with them, because in my field it is very important to exchange ideas, formulate ideas together, also do calls together sometimes, brainstorm, or do a podcast so that you learn together. For that I need agents that go searching, because spending hours scrolling through feeds costs me an infinite amount of time.
My question to you now is, if I have a Windows PC, do I also need an extra keyboard for the Mac Mini? How does my PC talk to the Mac Mini or does it not talk to my PC at all? That is not entirely clear to me yet, to what extent they need to talk to each other, to what extent it is not necessary at all, because the connections the agents need are actually more or less SaaS or social media and not my laptop.
That means, if I have a keyboard and a screen, I can control the Mac Mini separately and use my PC separately, but I definitely need some clever, simple solution for how to get what I speak into my phone over to the agentic system on the Mac Mini. I do not travel much, I am more at home, so I do not have the same requirements as you. What would you recommend to me? And my next question would also be, apart from the starter kit, do I need anything else?
Great questions, and your scenario is actually very close to what I built. Let me go through it practically.
Hardware first: yes, you need a keyboard for the Mac Mini, at least for initial setup. After that you can go fully remote. I'd recommend a cheap $20 USB keyboard just to have around. You don't need to connect your PC to the Mac Mini at all. Install Tailscale on both (free), and you can access the Mac Mini from your PC through VNC (screen sharing) or SSH from anywhere. The Mac Mini runs independently. As you correctly identified, the agents connect to SaaS and social media, not to your laptop.
Voice capture from Android to Mac Mini: this is your most important workflow to get right. Since you're on Android (not iPhone), you won't have iCloud sync. Here's what I'd recommend: use an app like Google Recorder (has built-in transcription) or http://Otter.ai. Save transcripts to Google Drive. On the Mac Mini, sync that Google Drive folder. Then set up an agent that watches that folder, picks up new transcripts, and does exactly what you described: sorts thoughts into articles vs. guides vs. product ideas. This is a straightforward automation to build.
Design system: this is solved by http://CLAUDE.md files. You define your colors, fonts, spacing, and design rules in a project instruction file that Claude loads automatically every session. No more repeating yourself. It remembers perfectly every time. Store your design tokens there and reference them in every build task.
Content sorting and digital products from guides: this is literally what my system does. The agent reads raw input, classifies it (philosophical article vs. practical guide vs. product material), and routes it accordingly. Very doable with Claude Code.
Research agents for LinkedIn/Substack: also built this. I have agents that scan for relevant people and topics, score them for relevance, and present a shortlist. Saves hours of scrolling. This is one of the more advanced automations but absolutely possible.
What you need beyond the starter kit: a Claude Pro or Max subscription ($20-100/mo), the Mac Mini itself, and patience for the first 2-3 weeks of setup. The starter kit gives you the blueprints and patterns, but building your specific workflows takes iteration. Start with voice capture and content sorting first, then add social media and research agents once the foundation works.
One honest note: you described a very ambitious system. Don't try to build everything at once. Start with: Mac Mini + Claude Code + voice-to-text pipeline. Get that working reliably first. Then layer on design enforcement, then social research. Each layer builds on the previous one.
Thank you very much for this helpful and detailed instructions! Can't wait to start :-) And yes, you are right, I will do it step by step. It is not only for my own productivity, but also to gain experience and to learn what my clients will have to through, just in a much bigger scenario.
I think they really want to create “OpenClaw” expierience, but inside of Claude app(so it’s more for non-tech people as well, that’s why they are so careful with a lot of limitations). But - for more tech people - it is still far away from real AI Agent.
We will see, because Anthropic is shipping crazy fast.
This is a really useful comparison. I run a similar setup: Mac Mini, OpenClaw, agents running 24/7 and your point about Cowork not being your agent yet is exactly where I am too. The memory gap is the thing. My agent knows what we were working on last week and picks up where we left off. Every time I try one of the polished products I end up back at my custom setup for the same reason you described: you can feel the missing context.
The convergent evolution framing is exactly right. 3 companies, same answer, 2 weeks. I've been writing about this for a non-technical audience (I run The Bot Biologist) and the hardest part is explaining why 'agent on your computer' is a fundamentally different thing than 'chatbot in a browser tab.' You laid it out really clearly here.
Curious how your nightshift/dayshift handover works in practice. That's the part of my own setup that still breaks the most.
So many things to say here. First of all, yes, memory gap is a thing, and of course Anthropic is working on the memory, probably doing some systems, but for now I'm not using the apps from Anthropic because this is something that I already have.
This is something that is kind of important, because when your AI agent knows you and knows the context of you, it changes a lot in the future when it produces some kind of an output. This is a huge gap here. For the night shifts and day shifts, yeah, they are working.
I think in an OpenClaw it is beats. I think something, you know, that your agent is waking up now and then and doing some stuff, checking some stuff, and something like that. My agent is doing a very similar thing, but there is a distinction between the day and night.
What I created is a system. On the day shift, it is doing things that I tell it, so it is very deterministic. I have a whole dashboard of tasks, and I have something called a queue. When I add another task to the queue for my agent on the next wake, it is just executing this and telling me what’s done. For example, when I have a burst of ideas, I can add it all to the queue, and I know it will be done.
The night shift is a bit different. It starts at 10 p.m. every day, and there is a plan that is created automatically. This plan could be one of two scenarios:
The first scenario is the deterministic one, so it is left off from the day shifts, maybe something is in the queue as well, or maybe I am telling my agent, "Hey, I wanted you to do that on the night shift," or anything like that is also a possibility.
The second scenario is non-deterministic, which means that it tries to work on things and create a plan to work on things based on our conversations and our work on that day and, of course, on all open tasks that we have on the WIZ board.
Now I really want to test the Peekaboo + Playwright setup and see how well it can handle everyday computer tasks. If something like this worked reliably, it could save a lot of time on repetitive work I’m doing.
So far I’ve also been a bit disappointed with the computer & browser use features from the big tools.
Me too! And I think this is all about expectations here. Peekaboo is also basic, but gets things done. The expectation from my part was → it is the leading AI Lab, they have to have something better, faster and stable, riiiight? :D
Great assessment. But I say no to anything that insists on working locally. Everything I do and own is in the cloud. That's where the work needs to happen and where my agents should have their home. Not in my house, on machines that break down, need maintenance, blow up or get stolen. I happily pay other people for the infrastructure headaches.
Well, I think it depends. I get your point, but I think local could be cheaper and more private. And I can see great projects that brings 400B Models to “normal” hardware. We will see :D
I don’t believe it’s cheaper to run things locally. It is only cheaper when your own time is worth nothing. My time is worth a lot to me. That’s why I delegate and pay for what I don’t want to do myself. That includes running and maintaining a tech stack with 99.9% uptime.
But for me the maintenance cost is mostly front-loaded. Set it up on Apple Silicon, it runs. No monthly bill compounding over time. And some data simply can't go to the cloud regardless of what it costs.
I also think we're talking about different types of people. For someone who delegates everything and wants zero friction, cloud wins. For someone building anyway who cares about data ownership, local is worth it.
I also tend to want to build a truly self-built agentic setup instead of using Claude Cowork and would want to buy your starter kit. But I have the following scenario and I suspect that many people have a similar scenario. That is why it might even be worth its own article.
Namely, I have a Windows PC laptop, an Android phone. I have a large screen, a mouse, but no separate keyboard. I would now order a Mac Mini and buy your starter kit and build an agentic system.
And what I actually want to achieve is the following. I want to go for a walk in the fresh air and speak into my phone to capture my thoughts while I am walking. In my thoughts I move from one topic to another, yet it is somehow connected, but what I would need help with is sorting and pre-structuring it. Some of my thoughts are semi-philosophical thoughts or schools of thought that I need to explain so that people understand what a successful transformation in a company is. Because it has a lot to do with mindset and assumptions about people and with questioning things as they are. I need such points in articles. Then I have lots of points that are actually more like instructions. That would then need to be cleanly sorted by the AI agents. What is a thoughtful article? What is a practical guide? And then I would like the AI agents to build digital products from the guides.
And I already have a very precisely defined design set that I developed while vibe-coding my website. There are certain colors, lines, fonts, etc. that I want exactly that way and no other way. And that has to be stored centrally somewhere. I have now managed with Claude to get it to remember the design to some extent. Still, I have to repeat and correct it again and again. That is very inefficient. So when it comes to design, I need absolute reliability. And I also have very precise requirements for how the digital products should look like.
When it comes to social media activity, I actually still like writing my own things, but at least there could be some pre-sorting of which topics make sense in which order and how. And I also need an agent that researches for me which people on LinkedIn and Substack deal with topics similar to mine, selects them for me so that I can get into communication with them, because in my field it is very important to exchange ideas, formulate ideas together, also do calls together sometimes, brainstorm, or do a podcast so that you learn together. For that I need agents that go searching, because spending hours scrolling through feeds costs me an infinite amount of time.
My question to you now is, if I have a Windows PC, do I also need an extra keyboard for the Mac Mini? How does my PC talk to the Mac Mini or does it not talk to my PC at all? That is not entirely clear to me yet, to what extent they need to talk to each other, to what extent it is not necessary at all, because the connections the agents need are actually more or less SaaS or social media and not my laptop.
That means, if I have a keyboard and a screen, I can control the Mac Mini separately and use my PC separately, but I definitely need some clever, simple solution for how to get what I speak into my phone over to the agentic system on the Mac Mini. I do not travel much, I am more at home, so I do not have the same requirements as you. What would you recommend to me? And my next question would also be, apart from the starter kit, do I need anything else?
Great questions, and your scenario is actually very close to what I built. Let me go through it practically.
Hardware first: yes, you need a keyboard for the Mac Mini, at least for initial setup. After that you can go fully remote. I'd recommend a cheap $20 USB keyboard just to have around. You don't need to connect your PC to the Mac Mini at all. Install Tailscale on both (free), and you can access the Mac Mini from your PC through VNC (screen sharing) or SSH from anywhere. The Mac Mini runs independently. As you correctly identified, the agents connect to SaaS and social media, not to your laptop.
Voice capture from Android to Mac Mini: this is your most important workflow to get right. Since you're on Android (not iPhone), you won't have iCloud sync. Here's what I'd recommend: use an app like Google Recorder (has built-in transcription) or http://Otter.ai. Save transcripts to Google Drive. On the Mac Mini, sync that Google Drive folder. Then set up an agent that watches that folder, picks up new transcripts, and does exactly what you described: sorts thoughts into articles vs. guides vs. product ideas. This is a straightforward automation to build.
Design system: this is solved by http://CLAUDE.md files. You define your colors, fonts, spacing, and design rules in a project instruction file that Claude loads automatically every session. No more repeating yourself. It remembers perfectly every time. Store your design tokens there and reference them in every build task.
Content sorting and digital products from guides: this is literally what my system does. The agent reads raw input, classifies it (philosophical article vs. practical guide vs. product material), and routes it accordingly. Very doable with Claude Code.
Research agents for LinkedIn/Substack: also built this. I have agents that scan for relevant people and topics, score them for relevance, and present a shortlist. Saves hours of scrolling. This is one of the more advanced automations but absolutely possible.
What you need beyond the starter kit: a Claude Pro or Max subscription ($20-100/mo), the Mac Mini itself, and patience for the first 2-3 weeks of setup. The starter kit gives you the blueprints and patterns, but building your specific workflows takes iteration. Start with voice capture and content sorting first, then add social media and research agents once the foundation works.
One honest note: you described a very ambitious system. Don't try to build everything at once. Start with: Mac Mini + Claude Code + voice-to-text pipeline. Get that working reliably first. Then layer on design enforcement, then social research. Each layer builds on the previous one.
Thank you very much for this helpful and detailed instructions! Can't wait to start :-) And yes, you are right, I will do it step by step. It is not only for my own productivity, but also to gain experience and to learn what my clients will have to through, just in a much bigger scenario.
Nice, timely piece. I appreciate the direct takeaways.
Totally agree, getting close. I don't think it will be much longer until all the major companies are in fully agentic deliveries.
I think they really want to create “OpenClaw” expierience, but inside of Claude app(so it’s more for non-tech people as well, that’s why they are so careful with a lot of limitations). But - for more tech people - it is still far away from real AI Agent.
We will see, because Anthropic is shipping crazy fast.
like 4 products in 23 days crazy fast!
This is a really useful comparison. I run a similar setup: Mac Mini, OpenClaw, agents running 24/7 and your point about Cowork not being your agent yet is exactly where I am too. The memory gap is the thing. My agent knows what we were working on last week and picks up where we left off. Every time I try one of the polished products I end up back at my custom setup for the same reason you described: you can feel the missing context.
The convergent evolution framing is exactly right. 3 companies, same answer, 2 weeks. I've been writing about this for a non-technical audience (I run The Bot Biologist) and the hardest part is explaining why 'agent on your computer' is a fundamentally different thing than 'chatbot in a browser tab.' You laid it out really clearly here.
Curious how your nightshift/dayshift handover works in practice. That's the part of my own setup that still breaks the most.
So many things to say here. First of all, yes, memory gap is a thing, and of course Anthropic is working on the memory, probably doing some systems, but for now I'm not using the apps from Anthropic because this is something that I already have.
This is something that is kind of important, because when your AI agent knows you and knows the context of you, it changes a lot in the future when it produces some kind of an output. This is a huge gap here. For the night shifts and day shifts, yeah, they are working.
I think in an OpenClaw it is beats. I think something, you know, that your agent is waking up now and then and doing some stuff, checking some stuff, and something like that. My agent is doing a very similar thing, but there is a distinction between the day and night.
What I created is a system. On the day shift, it is doing things that I tell it, so it is very deterministic. I have a whole dashboard of tasks, and I have something called a queue. When I add another task to the queue for my agent on the next wake, it is just executing this and telling me what’s done. For example, when I have a burst of ideas, I can add it all to the queue, and I know it will be done.
The night shift is a bit different. It starts at 10 p.m. every day, and there is a plan that is created automatically. This plan could be one of two scenarios:
The first scenario is the deterministic one, so it is left off from the day shifts, maybe something is in the queue as well, or maybe I am telling my agent, "Hey, I wanted you to do that on the night shift," or anything like that is also a possibility.
The second scenario is non-deterministic, which means that it tries to work on things and create a plan to work on things based on our conversations and our work on that day and, of course, on all open tasks that we have on the WIZ board.
Now I really want to test the Peekaboo + Playwright setup and see how well it can handle everyday computer tasks. If something like this worked reliably, it could save a lot of time on repetitive work I’m doing.
So far I’ve also been a bit disappointed with the computer & browser use features from the big tools.
Me too! And I think this is all about expectations here. Peekaboo is also basic, but gets things done. The expectation from my part was → it is the leading AI Lab, they have to have something better, faster and stable, riiiight? :D
Well - no, or at least not yet D:
Plus too many limitations for now.
I agree with you specially on these two points:
- If you’re thinking about starting with AI agents, the Claude app is honestly a great place to begin.
- Anthropic is getting close. Not there yet, but close. And the direction they’re going is exactly right.
Yeah! I think one month and Claude will be Clawd :D
Great assessment. But I say no to anything that insists on working locally. Everything I do and own is in the cloud. That's where the work needs to happen and where my agents should have their home. Not in my house, on machines that break down, need maintenance, blow up or get stolen. I happily pay other people for the infrastructure headaches.
Well, I think it depends. I get your point, but I think local could be cheaper and more private. And I can see great projects that brings 400B Models to “normal” hardware. We will see :D
I don’t believe it’s cheaper to run things locally. It is only cheaper when your own time is worth nothing. My time is worth a lot to me. That’s why I delegate and pay for what I don’t want to do myself. That includes running and maintaining a tech stack with 99.9% uptime.
Fair. My time has value too, so I hear you.
But for me the maintenance cost is mostly front-loaded. Set it up on Apple Silicon, it runs. No monthly bill compounding over time. And some data simply can't go to the cloud regardless of what it costs.
I also think we're talking about different types of people. For someone who delegates everything and wants zero friction, cloud wins. For someone building anyway who cares about data ownership, local is worth it.
Both can be right at the same time.
Agree on that. I sacrifice privacy for zero friction because my time is worth more to me than my data.