Has anyone tried in organization to use self hosted llm models for agentic programming?
Im curious if it makes any sense. My organization spends fortune on tokens from us companies. I want to recommend something…
our CEO has been buying new hires desktop gaming machines for this reason… currently they don’t have squat for graphics cards but once the rug is pulled from the cloud model pricing he said he’ll spend the $10k per machine to put a 96gb vram card in peoples’ machines to run shit locally
Qwen 3.6 and gemma4 models are the only ones usable for agentic prog sessions that I and my employer run locally. It’s less stable and slower than third-party services, even on much better hardware (as it’s with my employer). The best way is to go with a provider hosting deepseek flash/pro if your privacy policy allows though. It’s going to be hard to beat their price.
How many concurrent users and what hardware if i may ask?
it’s an h100, I think, no idea about how many users
in my personal setup i use quantized versions on a 3080, which is not great, so I still lean a lot on APIs
I thought those didn’t support tool calling. Has that changed?
they do



