mistralai

Mistral: Voxtral Small 24B 2507

Voxtral Small 24B 2507 is Mistral's audio-capable model, accepting text, audio, and file inputs within a 32,000-token context window. It supports tool use, which makes it usable in agentic pipelines, but it does not support reasoning modes and structured output availability is unconfirmed. There is no published maximum completion length to plan against, so output volume is an unknown quantity for capacity planning. At $0.10 per million input tokens and $0.30 per million output tokens, pricing is low enough to warrant a look for teams building audio-processing or transcription workflows on a budget. The catch is that no independent benchmark coverage exists yet, so performance relative to competing models is unverified. Buyers who need proven quality scores before committing should wait for third-party evaluations; those comfortable running their own evals on a cost-efficient audio model have a reasonable candidate to test.

Quality Score
83/100
price + capability + benchmarks
Input Price
$0.10
per 1M tokens
Output Price
$0.30
per 1M tokens
Context Window
32,000
tokens
Model ID
mistralai/voxtral-small-24b-2507
Vendor
mistralai
Tokenizer
Mistral
Input Modalities
text, audio, file
Output Modalities
text
Max Output
default
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
not supported
Vision
text only
Audio
✓ accepts audio
Moderated
no

Category rankings

Where Mistral: Voxtral Small 24B 2507 places across the 3 categories it ranks in. How we rank →

#CategoryScore
#18 TranscriptionVoice · of 19 ranked 106
#19 Audio SummarizationVoice · of 19 ranked 100
#19 TTS ReplacementVoice · of 19 ranked 98

Similar models