Voice · best for

Best AI model for Transcription (2026)

Speech-to-text accuracy and speed. Ranked from 343 live models on the OpenRouter catalog, weighted for audio input, low latency.

#ModelScoreIn / 1MOut / 1MContext
1 Xiaomi: MiMo-V2-Omnixiaomi/mimo-v2-omni 123 $0.40 $2.00 262,144 Try →
2 Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview 123 $0.25 $1.50 1,048,576 Try →
3 Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview 123 $0.50 $3.00 1,048,576 Try →
4 Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025 123 $0.10 $0.40 1,048,576 Try →
5 Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite 123 $0.10 $0.40 1,048,576 Try →
6 Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash 123 $0.30 $2.50 1,048,576 Try →
7 Google: Gemini 2.0 Flash Litegoogle/gemini-2.0-flash-lite-001 123 $0.07 $0.30 1,048,576 Try →
8 Google: Gemini 2.0 Flashgoogle/gemini-2.0-flash-001 123 $0.10 $0.40 1,000,000 Try →
9 Google: Gemini 3.1 Pro Preview Custom Toolsgoogle/gemini-3.1-pro-preview-customtools 115 $2.00 $12.00 1,048,576 Try →
10 Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview 115 $2.00 $12.00 1,048,576 Try →
11 Google: Gemini 2.5 Progoogle/gemini-2.5-pro 115 $1.25 $10.00 1,048,576 Try →
12 Google: Gemini 2.5 Pro Preview 06-05google/gemini-2.5-pro-preview 115 $1.25 $10.00 1,048,576 Try →
13 Google: Gemini 2.5 Pro Preview 05-06google/gemini-2.5-pro-preview-05-06 115 $1.25 $10.00 1,048,576 Try →
14 OpenAI: GPT Audio Miniopenai/gpt-audio-mini 112 $0.60 $2.40 128,000 Try →
15 Mistral: Voxtral Small 24B 2507mistralai/voxtral-small-24b-2507 101 $0.10 $0.30 32,000 Try →

How we ranked these

For Transcription, we weight models on audio input, low latency. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →

Related tasks