Powered by Xiaomi
MiMo-V2.5-Pro
- Vision-Language
MiMo-V2.5-Pro is Xiaomi’s flagship open-weight trillion-parameter omnimodal MoE language model optimized for long-context, tool-using AI agents. It is notable for its 1M-token context window and leading performance on agentic and coding benchmarks at comparatively low token cost.
About the model
What is MiMo-V2.5-Pro?
MiMo-V2.5-Pro is Xiaomi’s ~1T-parameter (42B active) open-weight omnimodal Mixture-of-Experts language model with a 1M-token context window, designed to power advanced AI agents. It is mainly used for complex software engineering and autonomous coding workflows, where it can run hours-long sessions involving code generation, debugging, and project management. It is also used for general-purpose agentic tasks such as tool use, long-horizon planning, and multi-step reasoning over very long documents and contexts. MiMo-V2.5-Pro belongs to Xiaomi’s MiMo-V2.5 family of models, following earlier MiMo-V2 variants and extending them with larger scale, longer context, and improved agent performance.
Model capabilities
5 Core Capabilities
-
Advanced Text Generation
Generates coherent, context-aware text for complex tasks with one-million-token context and large 128K-token outputs for long documents.
-
Deep Reasoning
Optimized Mixture-of-Experts reasoning for complex, multi-step planning, long-horizon tasks, and sophisticated agent workflows and automation.
-
Agent Tool Calling
Supports function and tool calling for structured outputs, enabling robust AI agents that interact with external systems and APIs.
-
Search-Augmented Answers
Integrates web search during inference to retrieve up-to-date information, improving factual accuracy and grounding of generated responses.
-
Multilingual Support
Handles multiple languages for generation and understanding, suitable for global users across diverse linguistic environments and content.
Use cases
6 Most Valuable Use Cases
- Agentic workflows automation
- Complex software engineering
- Long-horizon task planning
- Tool-using chat agents
- Large document analysis
- Cost-efficient AI deployment
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance for MiMo-V2.5-Pro–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120 tps | 99.99% | $0.20 | $0.20 | 256K |
| Xiaomi | Asia Pacific | ~150ms | ~40 tps | ~99.9% | ~$0.50 | ~$0.50 | ~128K |
| OpenAI-compatible Gateway | Global | ~120ms | ~70 tps | ~99.95% | ~$0.35 | ~$0.35 | ~128K |
| Tencent Cloud | Asia Pacific | ~160ms | ~50 tps | ~99.9% | ~$0.40 | ~$0.40 | ~64K |
| Huawei Cloud | China | ~170ms | ~45 tps | ~99.9% | ~$0.45 | ~$0.45 | ~64K |
Performance benchmarks
Technical Specifications
| Metric | MiMo-V2.5-Pro | Xiaomi MiMo-V2.5 | Huawei Pangu-Chat 2.0 |
|---|---|---|---|
| Avg Latency | ~180ms | ~220ms | ~250ms |
| Context Window | 128K | 64K | 32K |
| Input Price ($/1M tokens) | $0.80 | $0.70 | $1.00 |
| Output Price ($/1M tokens) | $1.60 | $1.40 | $2.00 |
| Max Output Tokens | 8K | 4K | 4K |
| Throughput | 48 tps | 36 tps | 30 tps |
| Uptime | 99.9% | 99.5% | 99.5% |
30-day usage via LLM API
- 7.8B
- Prompt tokens processed (last 30 days)
- 520M
- Completion tokens generated (last 30 days)
- 24.5M
- API requests served (last 30 days)
- 99.8%
- Average API uptime
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Automatically route each request to the best model across providers based on task, latency, and reliability—no client changes or custom glue code required.
One endpoint, every model. -
Cost-Aware Optimization
Continuously pick the most cost-effective models and configurations for each call, reducing AI spend without sacrificing quality or performance at scale.
Cut costs, not quality. -
Automatic Failover
When a provider degrades or fails, requests transparently fail over to healthy models, keeping your AI features online without on-call fire drills.
Resilience by default. -
End-to-End Observability
Trace every request across models and providers with metrics, logs, and structured events, so you can debug, tune, and prove reliability in production.
See every token flow. -
Task-Level Abstractions
Call high-level tasks like chat, generate, extract, or rank instead of vendor-specific APIs, so your business logic is decoupled from underlying model churn.
Code to tasks, not vendors. -
High-Throughput Batch
Run massive offline or async workloads through a single batch pipeline with retries, chunking, and parallelization handled for you by the platform.
Ship millions of calls.
Decision guide
When to Use — When NOT to Use
Use it if...
- You need an on-device assistant optimized for Xiaomi phones and IoT hardware.
- Your use case involves integrating AI features tightly with Xiaomi system apps and services.
- You need decent multimodal understanding of photos and screenshots from Xiaomi devices.
- Your use case involves Chinese-language queries and Xiaomi ecosystem–specific knowledge or content.
- You need a vendor-supported model aligned with Xiaomi’s privacy and data-handling policies.
- Your use case involves AI features preinstalled on Xiaomi phones without complex external infrastructure.
- You need voice or camera–driven assistance tuned for Xiaomi hardware capabilities and sensors.
Avoid if...
- You need state-of-the-art reasoning performance comparable to top-tier frontier foundation models.
- Your workload requires broad third-party ecosystem support, tooling, and community integrations.
- You need guaranteed cross-vendor portability across diverse cloud platforms and non-Xiaomi devices.
- Your workload requires extensively documented APIs, SDKs, and English-first developer resources.
- You need transparent, independently benchmarked performance for regulated or safety-critical applications.
- Your workload requires fine-tuning hooks and flexible model customization exposed to external developers.
- You need mature enterprise features like granular governance, auditing, and standardized compliance reports.
FAQ
Frequently Asked Questions
-
What is MiMo-V2.5-Pro?
MiMo-V2.5-Pro is a Xiaomi large language model available through LLM.API, optimized for general-purpose text generation and assistant-style interactions.
-
What is MiMo-V2.5-Pro best suited for?
MiMo-V2.5-Pro is best for chatbots, content generation, lightweight reasoning, and common coding or data-processing tasks where cost-efficiency matters.
-
What is the context window of MiMo-V2.5-Pro?
MiMo-V2.5-Pro supports a 16K-token context window, allowing relatively long conversations and documents within a single request.
-
What modalities does MiMo-V2.5-Pro support?
MiMo-V2.5-Pro supports text-in, text-out interactions only; it does not natively process images, audio, or video.
-
How is MiMo-V2.5-Pro priced on LLM.API?
LLM.API bills MiMo-V2.5-Pro per 1,000 tokens of input and output, with exact rates listed on the LLM.API pricing page.
-
What latency can I expect from MiMo-V2.5-Pro on LLM.API?
Typical end-to-end latency is around 500–1500 ms for short prompts, depending on request size, region, and current load.
-
How do I call MiMo-V2.5-Pro through the LLM.API?
Specify provider "Xiaomi" and model "MiMo-V2.5-Pro" in your LLM.API request along with your API key and usual completion parameters.
-
How does MiMo-V2.5-Pro compare to similar Xiaomi models?
MiMo-V2.5-Pro offers stronger reasoning and coding capabilities than earlier MiMo versions, at slightly higher cost and similar latency.
-
What limitations does MiMo-V2.5-Pro have?
MiMo-V2.5-Pro may hallucinate facts, lacks real-time internet access, and should not be used as the sole source for high-stakes decisions.
-
Can I use MiMo-V2.5-Pro for streaming responses?
Yes, MiMo-V2.5-Pro supports token streaming via LLM.API when you enable streaming in the request options.
