Mistral: Voxtral Small 24B 2507
Voxtral Small 24B 2507 is Mistral's audio-capable model, accepting text, audio, and file inputs within a 32,000-token context window. It supports tool use, which makes it usable in agentic pipelines, but it does not support reasoning modes and structured output availability is unconfirmed. There is no published maximum completion length to plan against, so output volume is an unknown quantity for capacity planning. At $0.10 per million input tokens and $0.30 per million output tokens, pricing is low enough to warrant a look for teams building audio-processing or transcription workflows on a budget. The catch is that no independent benchmark coverage exists yet, so performance relative to competing models is unverified. Buyers who need proven quality scores before committing should wait for third-party evaluations; those comfortable running their own evals on a cost-efficient audio model have a reasonable candidate to test.
- Model ID
- mistralai/voxtral-small-24b-2507
- Vendor
- mistralai
- Tokenizer
- Mistral
- Input Modalities
- text, audio, file
- Output Modalities
- text
- Max Output
- default
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- text only
- Audio
- ✓ accepts audio
- Moderated
- no
Category rankings
Where Mistral: Voxtral Small 24B 2507 places across the 3 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #18 | TranscriptionVoice · of 19 ranked | 106 |
| #19 | Audio SummarizationVoice · of 19 ranked | 100 |
| #19 | TTS ReplacementVoice · of 19 ranked | 98 |