OpenAI: GPT Audio Mini
GPT Audio Mini is OpenAI's lower-tier audio-capable model, accepting both text and audio as inputs and supporting tool use. It works with a 128,000-token context window and produces up to 16,384 completion tokens per response. It does not support reasoning modes, and structured output support is not confirmed. Pricing sits at $0.60 per million input tokens and $2.40 per million output tokens. For teams building voice-driven or audio-processing workflows on a tighter budget, this model's native audio input and tool support make it worth considering alongside pricier alternatives. The tradeoff is transparency: there is no independent benchmark coverage to evaluate its quality against competitors, so any shortlisting should be treated as tentative until you run task-specific evaluations of your own. Budget-conscious buyers get a potentially useful feature set, but go in knowing the performance profile is currently unproven.
- Model ID
- openai/gpt-audio-mini
- Vendor
- openai
- Tokenizer
- GPT
- Input Modalities
- text, audio
- Output Modalities
- text, audio
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- text only
- Audio
- ✓ accepts audio
- Moderated
- yes
Category rankings
Where OpenAI: GPT Audio Mini places across the 3 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #17 | TranscriptionVoice · of 19 ranked | 112 |
| #17 | Audio SummarizationVoice · of 19 ranked | 109 |
| #17 | TTS ReplacementVoice · of 19 ranked | 104 |