cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this
What the actual fuck is this timeline that we are living in?
I kinda loved his “you should self host to decentralize from big tech” and “run graphene and Linux to avoid data collection” content, but idk what the local ai stuff is any good for
It’s good for the same things machine learning has always been good for. Language synthesis and analysis. Selfhosting something like Paperless for document management. It actually has a very rudimentary learning engine for document classification for a long time but feeding document content to a local AI model for organization tagging is very useful.
It’s great for solo roleplaying.
I mean. Not great. But it’s something you can interact with in a way that’s not possible without other people. So that’s something.
If you use AI for a lot of small things, then you can offload the tasks to a locally run server.
Or if you see it as a feature you plan on using for a long time and don’t want to have to keep paying big tech for the privilege of using AI, and hell, you already have a nice graphics card, it’s perfect.
Even Jensen calls it LLM
How many GPUs do you even need to have a usable, self-hosted AI? It looks like he has 6 on his rig. Probably each costs 2k or something. That’s not peanuts. I have a 12GB VRAM card. It probably can’t generate anything in any meaningful amount of time. Which brings me to the question: who is this for?
Regardless, impressive what he vibe-coded there.
My MacBook Air with 24GB of unified RAM is enough to run something simple and useful.
That’s like what, 5 or 6k?
Like 1k
Reasonable price!
Price is comparable to a used RTX3090 with 24GB vram, which is probably more attractive to someone who is also interested in Linux/Windows gaming (and already owns a pc I mean). I would also guess that the RTX would be faster than the MacBook. IMO unified ram is more interesting when you can get a lot of it
The problem with that is you still have to buy the rest of the computer to put that 3090 in.
I think in one video it looked like 16 cards. I think he did multiple bifurcations of the pcie lanes. I think he is / was using it for protein folding as well.
That’s definitely not my level of disposable wealth/income. I can barely afford one card.
I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.
How do you now run out of RAM? Does it offload to system RAM?
Yes, offloads into system. Oh and i forgot to mention that’s with the context set around 25k. That can vary greatly per model though, it’s taken some experimentation to figure that out.
Thank you. That’s good to know.
I can tell you from personal experience, 8GB is not enough for a snappy experience. Maybe if you had it setup to churn through data overnight. My RTX 3060 Ti was not happy.
First one-click RCE is in: https://www.reddit.com/r/LocalLLaMA/comments/1ttls1y/just_found_a_1click_rce_in_pewdiepies_odysseus/ … smh …
So 1) that sucks but 2) why the fuck would you ever run this exposed to the internet.

He’s done the main quest. Now he’s doing the side quests.
One more harness, bro.
Man, this is Ouroboros feedback loop.
I love that guy. Remember hating him back in the days when he got popular by sitting and yelling while playing games. But damn the guy matured and put out epic content the past 10 years or so.
I’m a tiny bit confused as to what this actually is. I don’t use the Codex/ClaudeCode/Cursor stuff, but it seems like this is just an interface for connecting those services, isn’t it? It doesn’t seem like that actually protects your data at all.Can anyone help explain it a bit?
Edit: I realized I kinda glossed over all the stuff that seemed to be included in this, I more meant the start where he talked about this being privacy centric. Is he just trying to make self-hosting less painful?
Yeah essentially the other tools you had to use API keys, and none of them were FOSS, mostly paid only tools.
This lets you self host both the application interface itself (which can also be an IDE) and use a self hosted LLM







