Intelligence Per Dollar
Benchmark points per blended dollar across every credibly ranked model. Scores are min-max normalized across 18 models with 4+ independent benchmarks; cost assumes a typical 3:1 input:output token mix. Leader = 100.
This is not a quality ranking. #1 here means the most benchmark points per dollar, so a mid-scoring budget model will outrank a frontier model that costs 100x more. Check the Score and % of top score columns for raw capability, and use the task rankings when output quality is what compounds in your workflow.
Value leaderboard
| # | Model | Score | % of top score | In / Out per 1M | Blended $/1M | Value |
|---|---|---|---|---|---|---|
| 1 | DeepSeek: DeepSeek V4 Flash BEST VALUEdeepseek/deepseek-v4-flash | 75.1 | 79% | $0.09 / $0.18 | $0.11 | 100.0 |
| 2 | OpenAI: gpt-oss-120bopenai/gpt-oss-120b | 42.3 | 44% | $0.04 / $0.18 | $0.07 | 85.3 |
| 3 | OpenAI: gpt-oss-20bopenai/gpt-oss-20b | 23.8 | 25% | $0.03 / $0.14 | $0.06 | 62.8 |
| 4 | Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it | 51.0 | 53% | $0.06 / $0.33 | $0.13 | 59.9 |
| 5 | Google: Gemma 4 31Bgoogle/gemma-4-31b-it | 56.4 | 59% | $0.12 / $0.35 | $0.18 | 47.6 |
| 6 | DeepSeek: DeepSeek V3deepseek/deepseek-chat | 79.1 | 83% | $0.20 / $0.80 | $0.35 | 33.8 |
| 7 | DeepSeek: DeepSeek V4 Prodeepseek/deepseek-v4-pro | 82.5 | 86% | $0.43 / $0.87 | $0.54 | 22.7 |
| 8 | Qwen: Qwen3.5 397B A17Bqwen/qwen3.5-397b-a17b | 63.1 | 66% | $0.39 / $2.45 | $0.90 | 10.5 |
| 9 | MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 | 77.4 | 81% | $0.66 / $3.50 | $1.37 | 8.5 |
| 10 | Meta: Llama 4 Maverickmeta-llama/llama-4-maverick | 13.9 | 15% | $0.15 / $0.60 | $0.26 | 7.9 |
| 11 | Z.ai: GLM 5.2z-ai/glm-5.2 | 89.7 | 94% | $1.00 / $4.00 | $1.75 | 7.7 |
| 12 | Z.ai: GLM 5.1z-ai/glm-5.1 | 75.3 | 79% | $0.98 / $3.08 | $1.50 | 7.5 |
| 13 | Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview | 76.5 | 80% | $2.00 / $12.00 | $4.50 | 2.5 |
| 14 | OpenAI: GPT-5.4openai/gpt-5.4 | 90.4 | 95% | $2.50 / $15.00 | $5.62 | 2.4 |
| 15 | Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6 | 82.2 | 86% | $3.00 / $15.00 | $6.00 | 2.1 |
| 16 | Anthropic: Claude Opus 4.8anthropic/claude-opus-4.8 | 95.5 | 100% | $5.00 / $25.00 | $10.00 | 1.4 |
| 17 | Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 | 89.7 | 94% | $5.00 / $25.00 | $10.00 | 1.3 |
| 18 | OpenAI: GPT-5.5openai/gpt-5.5 | 93.7 | 98% | $5.00 / $30.00 | $11.25 | 1.2 |
Budget champions : 80+ score, cheapest first
| # | Model | Score | % of top score | In / Out per 1M | Blended $/1M | Value |
|---|---|---|---|---|---|---|
| 1 | DeepSeek: DeepSeek V4 Prodeepseek/deepseek-v4-pro | 82.5 | 86% | $0.43 / $0.87 | $0.54 | 22.7 |
| 2 | Z.ai: GLM 5.2z-ai/glm-5.2 | 89.7 | 94% | $1.00 / $4.00 | $1.75 | 7.7 |
| 3 | OpenAI: GPT-5.4openai/gpt-5.4 | 90.4 | 95% | $2.50 / $15.00 | $5.62 | 2.4 |
| 4 | Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6 | 82.2 | 86% | $3.00 / $15.00 | $6.00 | 2.1 |
| 5 | Anthropic: Claude Opus 4.8anthropic/claude-opus-4.8 | 95.5 | 100% | $5.00 / $25.00 | $10.00 | 1.4 |
| 6 | Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 | 89.7 | 94% | $5.00 / $25.00 | $10.00 | 1.3 |
| 7 | OpenAI: GPT-5.5openai/gpt-5.5 | 93.7 | 98% | $5.00 / $30.00 | $11.25 | 1.2 |
Assumptions
Value = blended benchmark score divided by blended price per million tokens, indexed to the leader. A 3:1 input:output ratio fits most chat and RAG workloads; estimate your exact mix with the cost calculator. Scoring details in the methodology. Models with fewer than 4 independent benchmarks are excluded rather than guessed.