Research · best for
Best AI model for Scientific Coding (2026)
NumPy, JAX, PyTorch — research-grade code. Ranked from 346 live models on the OpenRouter catalog, weighted for reasoning quality, tool calling, context window.
| # | Model | Score | In / 1M | Out / 1M | Context | |
|---|---|---|---|---|---|---|
| 1 | Qwen: Qwen3.6 Plusqwen/qwen3.6-plus | 136 | $0.33 | $1.95 | 1,000,000 | Try → |
| 2 | xAI: Grok 4.20x-ai/grok-4.20 | 136 | $2.00 | $6.00 | 2,000,000 | Try → |
| 3 | OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano | 136 | $0.20 | $1.25 | 400,000 | Try → |
| 4 | OpenAI: GPT-5.4 Miniopenai/gpt-5.4-mini | 136 | $0.75 | $4.50 | 400,000 | Try → |
| 5 | OpenAI: GPT-5.4openai/gpt-5.4 | 136 | $2.50 | $15.00 | 1,050,000 | Try → |
| 6 | Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview | 136 | $0.25 | $1.50 | 1,048,576 | Try → |
| 7 | Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23 | 136 | $0.07 | $0.26 | 1,000,000 | Try → |
| 8 | Google: Gemini 3.1 Pro Preview Custom Toolsgoogle/gemini-3.1-pro-preview-customtools | 136 | $2.00 | $12.00 | 1,048,576 | Try → |
| 9 | OpenAI: GPT-5.3-Codexopenai/gpt-5.3-codex | 136 | $1.75 | $14.00 | 400,000 | Try → |
| 10 | Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview | 136 | $2.00 | $12.00 | 1,048,576 | Try → |
| 11 | Qwen: Qwen3.5 Plus 2026-02-15qwen/qwen3.5-plus-02-15 | 136 | $0.26 | $1.56 | 1,000,000 | Try → |
| 12 | Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview | 136 | $0.50 | $3.00 | 1,048,576 | Try → |
| 13 | OpenAI: GPT-5.2openai/gpt-5.2 | 136 | $1.75 | $14.00 | 400,000 | Try → |
| 14 | Amazon: Nova 2 Liteamazon/nova-2-lite-v1 | 136 | $0.30 | $2.50 | 1,000,000 | Try → |
| 15 | xAI: Grok 4.1 Fastx-ai/grok-4.1-fast | 136 | $0.20 | $0.50 | 2,000,000 | Try → |
How we ranked these
For Scientific Coding, we weight models on reasoning quality, tool calling, context window. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →
Related tasks
Research
Best for Math Proofs
Formal proof construction and verification.
Research
Best for Literature Review
Synthesizing across many academic papers.
Research
Best for Experiment Design
Designing rigorous A/B and lab experiments.
Research
Best for Dataset Annotation
Annotating training data at scale.