19 Comments
User's avatar
EOONLabs's avatar

Stoked to know there are actually other folks thinking about this and how much we are missing in terms of what’s possible. Multi modal multi model multi agent. Solid. So why do we need all these data centers and frontier models again?

Pawel Jozefiak's avatar

I mean - Local LLMs are galaxies away from frontiner models. We would need to wait some time before we got something really good on our computers!

Fabrice Talbot's avatar

Very interesting! There's probably a market for startups who can build this kind of offering for businesses and help them save $$$. It addresses a key shortcoming of Claude and big LLMs: they're optimized for heavy duty stuff but charge the same price for easy task. The delegation model is elegant. Ideally, the app on your PC would decide which model to use (local vs cloud) and do all the fancy optimization for you (similar to your approach.

EOONLabs's avatar

Nice to meet you! 😂 Substack is the first place many of us even see a reason to share what we are building. Private, secure, sovereign, personal AI is the right way to approach this.

12345's avatar

Hi Guys, I am new to this and I have a Mac Studio. I want to run a local LLM to quickly segregate my 1)official outlook mails. I want another LLM to act as my notebook LLM. Any suggestions which resource I could use ?

Pawel Jozefiak's avatar

This is something that you can potentially do with a bigger local LLM, and I would suggest using a right harness for that. For example, official Outlook emails are possible via API, for example, or via Apple script if you don't use Outlook but use email instead. Both are fine and both are acceptable by models, but for local LLM I would go with API because it will give them more structured output rather than something that they have to figure out. Maybe use a harness like Pi.dev

12345's avatar

Hi pavel,

I am a total new comer to this field. What do you suggest is the simplest project I could start with as a use case for Gemma 4. Years want to get my hands dirty. I have downloaded openclaw llama, gemma4 aswell.

You seem to be someone who is knowledgeable, Can you help me level up ?

Pawel Jozefiak's avatar

Here's the simplest project that actually teaches you something useful: build a script that summarizes a local file.

You already have Ollama running. Open Python, read a .txt file (your notes, an email you exported), and POST it to localhost:11434/api/generate with a prompt like "Summarize this in 3 bullet points." Print the output. That's it - maybe 15 lines of code.

Why this is the right starting point: it maps directly to both things you mentioned (email sorting + notebook LLM), it teaches you the API layer that all the fancier tools are built on, and when it breaks you'll know exactly where.

Once that's working, you can layer in real email access (Outlook REST API or applescript depending on your setup) or multi-doc retrieval with something like ChromaDB. But don't start there.

Gemma 4 is solid for classification and summarization. Keep your context small - one document at a time until you understand the limits. Then you'll know why tools like LangChain or pl.dev exist.

12345's avatar

Oh thanks Pawel,

Your guidance is much appreciated. I will work on it and get back to you.

Gavin's avatar

Do you think investing in a mac mini is (still) worth it today? I revived my 10 years old linux box a month ago to run openclaw and fire-walled it within my home wifi, but it's not doing much aside from doing email aggregation for me.

Then I went on the road for 3 weeks and have more traveling to do, and my laptop becomes my primary device and it's infeasible to keep it always on. Now I am debating whether I should get a mac mini or similarly compact machine to travel with or station at home as server 🤔

Pawel Jozefiak's avatar

I think it depends, because it depends on what you really want to do. I didn't have a Mac Mini; I only had my personal MacBook Pro, which also meant that I didn't have any other machine at hand. The reason why I chose a Mac Mini is that the processor architecture is very good with local LLMs, and I also like Apple in terms of infrastructure and everything that I'm using.

I would say that if you have something already, maybe this is fine, and depending on what you want to do, if you only want to run the OpenClow or any custom AI agent, it's probably fine. If you want to add something like local LLMs, probably you would need something more powerful. I'm not saying super powerful, like a Mac Studio or even a maxed-out Mac Mini. I think that even the basics could give you some kind of independence, but yeah, I think it is up to what you really want.

I also posted a new piece on local LLMs that are using an SSD as an extra buffer for bigger models, so that might be also something interesting for you.

EOONLabs's avatar

Huge fan of including prismML 8B 1 bit model in the cluster.

Gavin's avatar

Yeah eventually I will want to run local LLM to take care of lower priority tasks instead of using Claude for everything but also I don't want to break the bank on that given I don't have many workflow that needs to be automated yet.

I will check out your post on local LLM w/ SSD

Louis Mai's avatar

I like this post a lot. The message triage concept is not new, but you really make it stick. And it implies further with local-cloud combination.

Just curious, will mmap affect life cycle of SSD?

Pawel Jozefiak's avatar

Thanks!

As per SSD life cycle - I don’t know :D

I would say - it might, but I don’t know if that kind of usage is even meaningful here. No data on that one!

Tom Parish's avatar

Anothe super useful post. thank you Pawel

Jonas Braadbaart's avatar

For running AI automations and classification I use make.com rather than a local setup since it's - in theory at least - always-on. The tasks and pipelines I run there are lightweight enough that in the most cases no extensive context is needed to complete them: https://metacircuits.substack.com/p/how-i-built-my-second-brain-in-3

I'm paying roughly USD 10 per month in API credits for classification, analysis, and summarization, but don't expect to be running this setup indefinitely so I don't see the need to switch for the moment.

Thomas R's avatar

Yes then it isn't offline – so to speak – which is probably not what people are looking for.