Best models to run with Ollama

Open source

Running models locally with Ollama means no API keys, no rate limits, and no data leaving your machine. These are the best open-weight models to run locally in 2026, grouped by the hardware you actually have, from a 16GB laptop to a multi-GPU server.

We also have more in-depth rankings:Best open-source AI models

The ranking

Updated June 2026

#ModelContextInput

1
1
gpt-oss-20bOpenAIThe best local default: OpenAI's small open model, comfortable on 16GB machines.
Context131K
Input$0.029/M
2
2
Qwen3 Coder 30B A3B InstructQwenThe local coding pick: a fast MoE coder that fits prosumer hardware.
Context160K
Input$0.07/M
3
3
Gemma 4 26B A4BGoogle DeepMindGoogle's efficient MoE: only 4B active parameters, so it runs fast locally.
Context262K
Input$0.06/M
4
4
Qwen3 30B A3B Instruct 2507QwenAll-round MoE with 3B active parameters, a favorite for local agents.
Context131K
Input$0.048/M
5
5
Gemma 4 31BGoogle DeepMindGoogle's open vision model: reads images locally on a 24GB card.
Context262K
Input$0.12/M
6
6
Nemotron 3 Nano 30B A3BNvidiaNvidia's nano MoE, tuned for efficient local inference.
Context262K
Input$0.05/M
7
7
gpt-oss-120bOpenAIThe high-end local pick: flagship-class quality on a 64GB+ workstation.
Context131K
Input$0.039/M
8
8
Qwen3 32BQwenA solid dense all-rounder for 24GB GPUs.
Context131K
Input$0.08/M
9
9
Llama 3.3 70B InstructMetaMeta's proven 70B, still a dependable choice for 48GB setups.
Context131K
Input$0.1/M
10
10
Devstral 2 2512Mistral AIMistral's open coding model, built with self-hosting in mind.
Context262K
Input$0.4/M
11
11
LFM2.5-1.2B-InstructLiquid AILiquid AI's tiny model for edge devices and instant responses.
Context33K
InputFree

Share:

Details: