Compare · Models

Frontier AI models, side by side.

12 models across closed and open-source labs. Pricing per 1M tokens (input / output), context window, native reasoning, tool use, multimodal capability, license. Plus a one-line "best for" and one-line "watch out".

Gemini 2.5 Flash

Google

1M ctx

Input / 1M

$0.10

Output / 1M

$0.30

Best for

Cheapest frontier-adjacent option. Production cheap-tier slot for many workloads.

Watch out

Quality varies more than tier-down siblings at other labs. Run an eval before swapping in.

DeepSeek V4 (open)

DeepSeek

128k ctx

Input / 1M

$0.27

Output / 1M

$1.10

Best for

Cheap reasoning. Open weights with frontier-tier reasoning quality at small sizes.

Watch out

Tool-use ergonomics are improving but still trail Anthropic. Verify on your own eval.

Qwen 3 (open)

Alibaba

128k ctx

Input / 1M

$0.40

Output / 1M

$1.20

Best for

Strong multilingual coverage (esp. CJK), good math, permissive license — easy to self-host.

Watch out

English-language quality slightly below Llama 4 on most tasks.

Llama 4 70B (open)

Meta

256k ctx

Input / 1M

$0.50

Output / 1M

$1.50

Best for

Self-hosting at scale. Quality competitive with mid-tier API models, much cheaper at >30k req/day.

Watch out

License restricts use above 700M MAU. Ops cost is real if you self-host.

Claude Haiku 4.5

Anthropic

200k ctx

Input / 1M

$0.80

Output / 1M

$4.00

Best for

High-volume cheap-tier work. Classification, simple extraction, fast autocomplete.

Watch out

Quality drop on multi-step reasoning. Tier up to Sonnet when the task is non-trivial.

Gemini 2.5 Pro

Google

2M ctx

Input / 1M

$0.85

Output / 1M

$3.50

Best for

Massive context (2M tokens), native video and audio. Good cost/quality.

Watch out

Tool use ergonomics still trail Anthropic / OpenAI in some patterns.

Mistral Large 3

Mistral AI

128k ctx

Input / 1M

$2.00

Output / 1M

$6.00

Best for

European data residency, strong multilingual, esp. EU languages.

Watch out

Smaller context window. No native multimodal. Quality lags Anthropic / OpenAI on hardest tasks.

Llama 4 405B (open)

Meta

1M ctx

Input / 1M

$2.50

Output / 1M

$5.00

Best for

Frontier-tier open model. Self-host if you have the GPUs; otherwise via Together / Anyscale.

Watch out

Heavy. Single H100 won't fit at FP16. Multi-GPU or aggressive quantization required.

Claude Sonnet 4.6

Anthropic

200k ctx

Input / 1M

$3.00

Output / 1M

$15.00

Best for

Production-grade default. Best price/quality on most workloads. Native tool use is excellent.

Watch out

No video / audio in. Reach for Gemini for those.

GPT GPT-5

OpenAI

400k ctx

Input / 1M

$5.00

Output / 1M

$15.00

Best for

Strong overall. Best ecosystem (Assistants API, native voice, code interpreter, web search).

Watch out

Pricing tier complexity. GPT-5-mini and 5-nano variants offer real cost wins on simpler work.

Claude Opus 4.7

Anthropic

1M ctx

Input / 1M

$15.00

Output / 1M

$75.00

Best for

Hardest reasoning + agent tasks, code generation at scale, long-context analysis.

Watch out

Premium pricing — tier down to Sonnet for the easy 80% of traffic.

GPT o-series (reasoning)

OpenAI

400k ctx

Input / 1M

$15.00

Output / 1M

$60.00

Best for

Math, hard reasoning, multi-step planning. Burns tokens on internal reasoning before answering.

Watch out

Latency 10x non-reasoning. Wrong tool for chat / autocomplete.

Model	Context	In / Out (per 1M)	Reasoning	Tool use	Multimodal	License
Gemini 2.5 Flash Google	1M	$0.10 / $0.30	no	partial	image+audio	Closed
DeepSeek V4 (open) DeepSeek	128k	$0.27 / $1.10	native	partial	—	Open · restricted
Qwen 3 (open) Alibaba	128k	$0.40 / $1.20	optional	native	image	Open · permissive
Llama 4 70B (open) Meta	256k	$0.50 / $1.50	optional	native	image	Open · restricted
Claude Haiku 4.5 Anthropic	200k	$0.80 / $4.00	no	native	image	Closed
Gemini 2.5 Pro Google	2M	$0.85 / $3.50	optional	partial	image+audio+video	Closed
Mistral Large 3 Mistral AI	128k	$2.00 / $6.00	no	native	—	Closed
Llama 4 405B (open) Meta	1M	$2.50 / $5.00	optional	native	image	Open · restricted
Claude Sonnet 4.6 Anthropic	200k	$3.00 / $15.00	optional	native	image	Closed
GPT GPT-5 OpenAI	400k	$5.00 / $15.00	optional	native	image+audio	Closed
Claude Opus 4.7 Anthropic	1M	$15.00 / $75.00	native	native	image	Closed
GPT o-series (reasoning) OpenAI	400k	$15.00 / $60.00	native	native	image	Closed

Gemini 2.5 Flash

Google

Best for

Cheapest frontier-adjacent option. Production cheap-tier slot for many workloads.

Watch out

Quality varies more than tier-down siblings at other labs. Run an eval before swapping in.

DeepSeek V4 (open)

DeepSeek

Best for

Cheap reasoning. Open weights with frontier-tier reasoning quality at small sizes.

Watch out

Tool-use ergonomics are improving but still trail Anthropic. Verify on your own eval.

Qwen 3 (open)

Alibaba

Best for

Strong multilingual coverage (esp. CJK), good math, permissive license — easy to self-host.

Watch out

English-language quality slightly below Llama 4 on most tasks.

Llama 4 70B (open)

Meta

Best for

Self-hosting at scale. Quality competitive with mid-tier API models, much cheaper at >30k req/day.

Watch out

License restricts use above 700M MAU. Ops cost is real if you self-host.

Claude Haiku 4.5

Anthropic

Best for

High-volume cheap-tier work. Classification, simple extraction, fast autocomplete.

Watch out

Quality drop on multi-step reasoning. Tier up to Sonnet when the task is non-trivial.

Gemini 2.5 Pro

Google

Best for

Massive context (2M tokens), native video and audio. Good cost/quality.

Watch out

Tool use ergonomics still trail Anthropic / OpenAI in some patterns.

Mistral Large 3

Mistral AI

Best for

European data residency, strong multilingual, esp. EU languages.

Watch out

Smaller context window. No native multimodal. Quality lags Anthropic / OpenAI on hardest tasks.

Llama 4 405B (open)

Meta

Best for

Frontier-tier open model. Self-host if you have the GPUs; otherwise via Together / Anyscale.

Watch out

Heavy. Single H100 won't fit at FP16. Multi-GPU or aggressive quantization required.

Claude Sonnet 4.6

Anthropic

Best for

Production-grade default. Best price/quality on most workloads. Native tool use is excellent.

Watch out

No video / audio in. Reach for Gemini for those.

GPT GPT-5

OpenAI

Best for

Strong overall. Best ecosystem (Assistants API, native voice, code interpreter, web search).

Watch out

Pricing tier complexity. GPT-5-mini and 5-nano variants offer real cost wins on simpler work.

Claude Opus 4.7

Anthropic

Best for

Hardest reasoning + agent tasks, code generation at scale, long-context analysis.

Watch out

Premium pricing — tier down to Sonnet for the easy 80% of traffic.

GPT o-series (reasoning)

OpenAI

Best for

Math, hard reasoning, multi-step planning. Burns tokens on internal reasoning before answering.

Watch out

Latency 10x non-reasoning. Wrong tool for chat / autocomplete.

Numbers move quickly. Treat this as a snapshot. Verify against the vendor docs before committing to a tier in your stack.