mistralai

Mistral: Voxtral Small 24B 2507

Voxtral Small 24B 2507 is Mistral's audio-capable model, accepting text, audio, and file inputs within a 32,000-token context window. It supports tool use, which makes it usable in agentic pipelines, but it does not support reasoning modes and structured output availability is unconfirmed. There is no published maximum completion length to plan against, so output volume is an unknown quantity for capacity planning. At $0.10 per million input tokens and $0.30 per million output tokens, pricing is low enough to warrant a look for teams building audio-processing or transcription workflows on a budget. The catch is that no independent benchmark coverage exists yet, so performance relative to competing models is unverified. Buyers who need proven quality scores before committing should wait for third-party evaluations; those comfortable running their own evals on a cost-efficient audio model have a reasonable candidate to test.

Query via API → View on mistralai → Estimate cost

Quality Score

83/100

price + capability + benchmarks

Input Price

$0.10

per 1M tokens

Output Price

$0.30

per 1M tokens

Context Window

32,000

tokens

Model ID: mistralai/voxtral-small-24b-2507
Vendor: mistralai
Tokenizer: Mistral
Input Modalities: text, audio, file
Output Modalities: text
Max Output: default
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: text only
Audio: ✓ accepts audio
Moderated: no

Category rankings

Where Mistral: Voxtral Small 24B 2507 places across the 3 categories it ranks in. How we rank →

#	Category	Score
#18	TranscriptionVoice · of 19 ranked	106
#19	Audio SummarizationVoice · of 19 ranked	100
#19	TTS ReplacementVoice · of 19 ranked	98

Similar models

mistralai

Mistral: Voxtral Small 24B 2507

Category rankings

Similar models

Mistral: Mistral Nemo

Mistral Large

Mistral Large 2407

Mistral: Mixtral 8x22B Instruct

Mistral: Saba

Mistral: Mistral Small 3.2 24B