Local LLM agents

Kkk2237pl@lemmy.world · 4 days ago

Local LLM agents

PeeOnYou [he/him]@lemmygrad.ml · 2 days ago

our CEO has been buying new hires desktop gaming machines for this reason… currently they don’t have squat for graphics cards but once the rug is pulled from the cloud model pricing he said he’ll spend the $10k per machine to put a 96gb vram card in peoples’ machines to run shit locally

Eager Eagle@lemmy.world · 4 days ago

Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It’s less stable and slower than third-party services, even on much better hardware (as it’s with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It’s going to be hard to beat their price.

adhdsergio@lemmy.world · 3 days ago

How many concurrent users and what hardware if i may ask?

Eager Eagle@lemmy.world · 3 days ago

it’s an h100, I think, no idea about how many users

in my personal setup i use quantized versions on a 3080, which is not great, so I still lean a lot on APIs

onlinepersona@programming.dev · 4 days ago

I thought those didn’t support tool calling. Has that changed?

Eager Eagle@lemmy.world · 3 days ago

they do