xiaomi
Xiaomi: MiMo-V2.5
MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...
Quality Score
100/100
price + capability + benchmarks
Input Price
$0.14
per 1M tokens
Output Price
$0.28
per 1M tokens
Context Window
1,048,576
tokens
- Model ID
- xiaomi/mimo-v2.5
- Vendor
- xiaomi
- Tokenizer
- Other
- Input Modalities
- text, audio, image, video
- Output Modalities
- text
- Max Output
- 131,072 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- ✓ accepts audio
- Moderated
- no
Strong choice for
Code
Code Completion
Inline IDE-style autocomplete that has to feel instant.
Writing
Social Media Posts
Tweets, LinkedIn posts, captions in the right voice.
Voice
Voice Assistant Backend
Real-time voice agent backbones.
Voice
Transcription
Speech-to-text accuracy and speed.
Cost
Cheap Bulk Inference
Lowest cost-per-million for high-volume jobs.
Cost
Self-Hosted / Local
Open-weights models you can run yourself.