Agents · best for

Top picks for Function / Tool Calling (2026)

Reliable JSON tool-call generation. Ranked from 334 live models on the OpenRouter catalog, weighted for tool calling, structured output.

What this is Ranked by capability match + real benchmark scores (Aider Polyglot, Artificial Analysis Intelligence Index) + live pricing. Models need the right specs for Function / Tool Calling, then benchmark performance refines the order. Full methodology →
#ModelScoreIn / 1MOut / 1MContext
1 Anthropic: Claude Sonnet 4.6anthropic/claude-sonnet-4.6 159 $3.00 $15.00 1,000,000 Details →
2 Anthropic: Claude Opus 4.7anthropic/claude-opus-4.7 157 $5.00 $25.00 1,000,000 Details →
3 OpenAI: GPT-5.4openai/gpt-5.4 152 $2.50 $15.00 1,050,000 Details →
4 Google: Gemini 3.5 Flashgoogle/gemini-3.5-flash 151 $1.50 $9.00 1,048,576 Details →
5 Z.ai: GLM 5.2z-ai/glm-5.2 150 $1.00 $4.00 1,048,576 Details →
6 MiniMax: MiniMax M3minimax/minimax-m3 150 $0.30 $1.20 1,048,576 Details →
7 DeepSeek: DeepSeek V4 Prodeepseek/deepseek-v4-pro 149 $0.43 $0.87 1,048,576 Details →
8 Anthropic: Claude Opus 4.8anthropic/claude-opus-4.8 149 $5.00 $25.00 1,000,000 Details →
9 MoonshotAI: Kimi K2.7 Codemoonshotai/kimi-k2.7-code 148 $0.61 $3.07 262,144 Details →
10 DeepSeek: DeepSeek V4 Flashdeepseek/deepseek-v4-flash 147 $0.09 $0.18 1,048,576 Details →
11 Xiaomi: MiMo-V2.5-Proxiaomi/mimo-v2.5-pro 147 $0.43 $0.87 1,048,576 Details →
12 MoonshotAI: Kimi K2.6moonshotai/kimi-k2.6 147 $0.66 $3.50 262,144 Details →
13 OpenAI: GPT-5.4 Miniopenai/gpt-5.4-mini 147 $0.75 $4.50 400,000 Details →
14 OpenAI: GPT-5.4 Nanoopenai/gpt-5.4-nano 146 $0.20 $1.25 400,000 Details →
15 Qwen: Qwen3.7 Maxqwen/qwen3.7-max 146 $1.25 $3.75 1,000,000 Details →
AI Apps OnSpace AI Build and deploy AI-powered apps without code.
Try free →

Affiliate link. PicksByModel may earn a commission at no extra cost to you.

How we ranked these

For Function / Tool Calling, we weight models on tool calling, structured output. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →

About Function / Tool Calling

Function calling is the task of generating properly formatted JSON that maps user intent to specific tool or API invocations. You need it when building agents, chatbots, or automation systems that must reliably execute external functions rather than generate freeform text. A good model produces valid, schema-compliant JSON consistently, with correct parameter mapping and no hallucinated fields; a poor one generates malformed JSON, invents tool names, or misaligns arguments to the wrong functions. The main cost consideration is that stricter models (like Claude 3.5 Sonnet with native tool_use) reduce parsing failures and retry loops, lowering total token spend despite higher per-call cost.

When to use: Use this when you need an AI to decide which real action to take (book a flight, query a database, send an email) rather than just talk about it. The AI should output a specific instruction the computer can immediately execute.

Common questions

Which models are best at function calling without generating invalid JSON?

Claude 3.5 Sonnet and GPT-4 Turbo both excel here, with Claude's native tool_use mode offering the lowest error rate for schema compliance. Open-source models like Llama 2 70B can work with strict prompt engineering, but require more retry overhead and validation logic.

How much slower is function calling compared to regular text generation?

Function calling typically adds 10-20% latency because models must reason about which tool to call before generating JSON. If you're calling multiple tools in sequence (a multi-step agent), latency compounds, but intelligent caching and parallel tool execution can offset this cost.

Related tasks