xiaomi
Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Quality Score
100/100
composite of price, context, capability
Input Price
$0.40
per 1M tokens
Output Price
$2.00
per 1M tokens
Context Window
262,144
tokens
- Model ID
- xiaomi/mimo-v2-omni
- Vendor
- xiaomi
- Tokenizer
- Other
- Input Modalities
- text, audio, image, video
- Output Modalities
- text
- Max Output
- 65,536 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no