What local, small models are you all using?

devxyn@sh.itjust.works · 5 months ago

What local, small models are you all using?

FrankLaskey@lemmy.ml · 5 months ago

In my opinion, Qwen3-30B-A3B-2507 would be the best here. Thinking version likely best for most things as long as you don’t mind a slight penalty to speed for more accuracy. I use the quantized IQ4_XS models from Bartowski or Unsloth on HuggingFace.

I’ve seen the new OSS-20B models from OpenAI ranked well in benchmarks but I have not liked the output at all. Typically seems lazy and not very comprehensive. And makes obvious errors.

If you want even smaller and faster the Qwen3 Distill of DeepSeek R1 0528 8B is great for its size (esp if you’re trying to free up some VRAM to use larger context lengths)

devxyn@sh.itjust.works · 5 months ago

That’s what I’m using, and it’s pretty nice. Thanks for your input!

𞋴𝛂𝛋𝛆@lemmy.world · 5 months ago

Qwen 2.5 VL and Code. I have a VL doing image captions for LoRA training running now. A 14B is okay for basic code. A quantized 32B 6KL gguf of the same Qwen 2.5 code model runs on 16GB but at a third of the speed of the 14B in bits and bytes 4b. The latter is reasonably fast enough for a couple layers of agentic stuff in emacs with gptel and hits thinking or function calling out of a llama.cpp server better than 50% of the time.

I still haven’t tried the new 20B out of Open AI yet.

SmokeyDope@lemmy.world · 5 months ago

I’m a big fan of NousResearch their deephermes release was awesome and now I’m trying out Hermes 4. I have an 8gb 1070ti GPU was able to fully offload a medium quant of hermes 4 14b with an okay amount of context.

I’m a big fan of the hybrid reasoning models I like being able to turn thinking on or of depending on scenario.

I had a vision model document scanner + TTS going on with a finetune of qwen 2.5 vl and outetts.

If you care more about character emulation for writing and creativity then mistral 2407 and mistral NeMo are other models to check out.