Voice · best for

Best AI model for Transcription (2026)

Speech-to-text accuracy and speed. Ranked from 343 live models on the OpenRouter catalog, weighted for audio input, low latency.

#	Model	Score	In / 1M	Out / 1M	Context
1	Xiaomi: MiMo-V2-Omnixiaomi/mimo-v2-omni	123	$0.40	$2.00	262,144	Try →
2	Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview	123	$0.25	$1.50	1,048,576	Try →
3	Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview	123	$0.50	$3.00	1,048,576	Try →
4	Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025	123	$0.10	$0.40	1,048,576	Try →
5	Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite	123	$0.10	$0.40	1,048,576	Try →
6	Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash	123	$0.30	$2.50	1,048,576	Try →
7	Google: Gemini 2.0 Flash Litegoogle/gemini-2.0-flash-lite-001	123	$0.07	$0.30	1,048,576	Try →
8	Google: Gemini 2.0 Flashgoogle/gemini-2.0-flash-001	123	$0.10	$0.40	1,000,000	Try →
9	Google: Gemini 3.1 Pro Preview Custom Toolsgoogle/gemini-3.1-pro-preview-customtools	115	$2.00	$12.00	1,048,576	Try →
10	Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview	115	$2.00	$12.00	1,048,576	Try →
11	Google: Gemini 2.5 Progoogle/gemini-2.5-pro	115	$1.25	$10.00	1,048,576	Try →
12	Google: Gemini 2.5 Pro Preview 06-05google/gemini-2.5-pro-preview	115	$1.25	$10.00	1,048,576	Try →
13	Google: Gemini 2.5 Pro Preview 05-06google/gemini-2.5-pro-preview-05-06	115	$1.25	$10.00	1,048,576	Try →
14	OpenAI: GPT Audio Miniopenai/gpt-audio-mini	112	$0.60	$2.40	128,000	Try →
15	Mistral: Voxtral Small 24B 2507mistralai/voxtral-small-24b-2507	101	$0.10	$0.30	32,000	Try →

How we ranked these

For Transcription, we weight models on audio input, low latency. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →

Related tasks

Voice

Best AI model for Transcription (2026)

How we ranked these

Related tasks

Best for Voice Assistant Backend

Best for Audio Summarization

Best for TTS Replacement