Best models to run with Ollama
Open sourceRunning models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.
We also have more in-depth rankings:Best open-source AI models
The ranking
Updated June 2026
#ModelContextInput
- 11gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.Context131KInput$0.029/M
- 22Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.Context160KInput$0.07/M
- 33Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.Context262KInput$0.06/M
- 44Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.Context131KInput$0.048/M
- 55Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.Context262KInput$0.12/M
- 66Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.Context262KInput$0.05/M
- 77gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.Context131KInput$0.039/M
- 88Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.Context131KInput$0.08/M
- 99Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.Context131KInput$0.1/M
- 1010Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.Context262KInput$0.4/M
- 1111LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.Context33KInputFree
Details:
Models
11Filter
Open sourceUpdated
June 2026
Ad