My favorite local model right now is a bit of surprise to me: I'm really enjoying the relatively tiny Qwen3-8B, running the 4bit quantized version on my Mac using MLX

It's surprisingly capable given it's a 4.3GB download and uses just 4-5GB of RAM while it's running

<https://simonwillison.net/2025/May/2/qwen3-8b/>