What is MiMo-V2.5-Pro best suited for?

MiMo-V2.5-Pro is best for multi-turn assistants, code generation, and structured data extraction where reliable text understanding and generation are required.

What is the context window of MiMo-V2.5-Pro?

MiMo-V2.5-Pro supports a 16K-token context window, allowing relatively long conversations and documents within a single request.

What modalities does MiMo-V2.5-Pro support?

MiMo-V2.5-Pro supports text-in, text-out interactions only; it does not natively process images, audio, or video.

How is MiMo-V2.5-Pro priced on LLM.API?

LLM.API bills MiMo-V2.5-Pro per 1,000 tokens of input and output, with exact rates listed on the LLM.API pricing page.

What latency can I expect from MiMo-V2.5-Pro on LLM.API?

Typical end-to-end latency is around 500–1500 ms for short prompts, depending on request size, region, and current load.

How do I call MiMo-V2.5-Pro through the LLM.API?

Specify provider "Xiaomi" and model "MiMo-V2.5-Pro" in your LLM.API request along with your API key and usual completion parameters.

How does MiMo-V2.5-Pro compare to similar Xiaomi models?

MiMo-V2.5-Pro offers stronger reasoning and coding capabilities than earlier MiMo versions, at slightly higher cost and similar latency.

What limitations does MiMo-V2.5-Pro have?

MiMo-V2.5-Pro may hallucinate facts, lacks real-time internet access, and should not be used as the sole source for high-stakes decisions.

Can I use MiMo-V2.5-Pro for streaming responses?

Yes, MiMo-V2.5-Pro supports token streaming via LLM.API when you enable streaming in the request options.

MiMo-V2.5-Pro

Vision-Language

MiMo-V2.5-Pro is Xiaomi’s flagship open-weight trillion-parameter omnimodal MoE language model optimized for long-context, tool-using AI agents. It is notable for its 1M-token context window and leading performance on agentic and coding benchmarks at comparatively low token cost.

Start Using API

API Performance

Latency: ~0.7s time to first token
Context: 1M token context
Input: ~$0.52 per 1M tokens
Output: ~$1.04 per 1M tokens
Uptime: 99% 99%

About the model

What is MiMo-V2.5-Pro?

MiMo-V2.5-Pro is Xiaomi’s ~1T-parameter (42B active) open-weight omnimodal Mixture-of-Experts language model with a 1M-token context window, designed to power advanced AI agents. It is mainly used for complex software engineering and autonomous coding workflows, where it can run hours-long sessions involving code generation, debugging, and project management. It is also used for general-purpose agentic tasks such as tool use, long-horizon planning, and multi-step reasoning over very long documents and contexts. MiMo-V2.5-Pro belongs to Xiaomi’s MiMo-V2.5 family of models, following earlier MiMo-V2 variants and extending them with larger scale, longer context, and improved agent performance.

Input / Output

Input

Text prompts (natural language, code, or structured text)
Images for vision understanding
Files and documents as input (for long-context processing)

Output

Conversational and free-form text responses
Source code and technical outputs

Model capabilities

5 Core Capabilities

Advanced Text Generation

Generates coherent, context-aware text for complex tasks with one-million-token context and large 128K-token outputs for long documents.
Deep Reasoning

Optimized Mixture-of-Experts reasoning for complex, multi-step planning, long-horizon tasks, and sophisticated agent workflows and automation.
Agent Tool Calling

Supports function and tool calling for structured outputs, enabling robust AI agents that interact with external systems and APIs.
Search-Augmented Answers

Integrates web search during inference to retrieve up-to-date information, improving factual accuracy and grounding of generated responses.
Multilingual Support

Handles multiple languages for generation and understanding, suitable for global users across diverse linguistic environments and content.

Use cases

6 Most Valuable Use Cases

Agentic workflows automation
Complex software engineering
Long-horizon task planning
Tool-using chat agents
Large document analysis
Cost-efficient AI deployment

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance for MiMo-V2.5-Pro–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 tps	99.99%	$0.20	$0.20	256K
Xiaomi	Asia Pacific	~150ms	~40 tps	~99.9%	~$0.50	~$0.50	~128K
OpenAI-compatible Gateway	Global	~120ms	~70 tps	~99.95%	~$0.35	~$0.35	~128K
Tencent Cloud	Asia Pacific	~160ms	~50 tps	~99.9%	~$0.40	~$0.40	~64K
Huawei Cloud	China	~170ms	~45 tps	~99.9%	~$0.45	~$0.45	~64K

Performance benchmarks

Technical Specifications

Metric	MiMo-V2.5-Pro	Xiaomi MiMo-V2.5	Huawei Pangu-Chat 2.0
Avg Latency	~180ms	~220ms	~250ms
Context Window	128K	64K	32K
Input Price ($/1M tokens)	$0.80	$0.70	$1.00
Output Price ($/1M tokens)	$1.60	$1.40	$2.00
Max Output Tokens	8K	4K	4K
Throughput	48 tps	36 tps	30 tps
Uptime	99.9%	99.5%	99.5%

30-day usage via LLM API

7.8B: Prompt tokens processed (last 30 days)
520M: Completion tokens generated (last 30 days)
24.5M: API requests served (last 30 days)
99.8%: Average API uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Automatically route each request to the best model across providers based on task, latency, and reliability—no client changes or custom glue code required.
One endpoint, every model.
Cost-Aware Optimization

Continuously pick the most cost-effective models and configurations for each call, reducing AI spend without sacrificing quality or performance at scale.
Cut costs, not quality.
Automatic Failover

When a provider degrades or fails, requests transparently fail over to healthy models, keeping your AI features online without on-call fire drills.
Resilience by default.
End-to-End Observability

Trace every request across models and providers with metrics, logs, and structured events, so you can debug, tune, and prove reliability in production.
See every token flow.
Task-Level Abstractions

Call high-level tasks like chat, generate, extract, or rank instead of vendor-specific APIs, so your business logic is decoupled from underlying model churn.
Code to tasks, not vendors.
High-Throughput Batch

Run massive offline or async workloads through a single batch pipeline with retries, chunking, and parallelization handled for you by the platform.
Ship millions of calls.

Decision guide

When to Use — When NOT to Use

Use it if...

You need an on-device assistant optimized for Xiaomi phones and IoT hardware.
Your use case involves integrating AI features tightly with Xiaomi system apps and services.
You need decent multimodal understanding of photos and screenshots from Xiaomi devices.
Your use case involves Chinese-language queries and Xiaomi ecosystem–specific knowledge or content.
You need a vendor-supported model aligned with Xiaomi’s privacy and data-handling policies.
Your use case involves AI features preinstalled on Xiaomi phones without complex external infrastructure.
You need voice or camera–driven assistance tuned for Xiaomi hardware capabilities and sensors.

Avoid if...

You need state-of-the-art reasoning performance comparable to top-tier frontier foundation models.
Your workload requires broad third-party ecosystem support, tooling, and community integrations.
You need guaranteed cross-vendor portability across diverse cloud platforms and non-Xiaomi devices.
Your workload requires extensively documented APIs, SDKs, and English-first developer resources.
You need transparent, independently benchmarked performance for regulated or safety-critical applications.
Your workload requires fine-tuning hooks and flexible model customization exposed to external developers.
You need mature enterprise features like granular governance, auditing, and standardized compliance reports.

FAQ

Frequently Asked Questions

What is MiMo-V2.5-Pro?

MiMo-V2.5-Pro is a Xiaomi large language model available through LLM.API, optimized for general-purpose text generation and assistant-style interactions.
What is MiMo-V2.5-Pro best suited for?

MiMo-V2.5-Pro is best for multi-turn assistants, code generation, and structured data extraction where reliable text understanding and generation are required.
What is the context window of MiMo-V2.5-Pro?

MiMo-V2.5-Pro supports a 16K-token context window, allowing relatively long conversations and documents within a single request.
What modalities does MiMo-V2.5-Pro support?

MiMo-V2.5-Pro supports text-in, text-out interactions only; it does not natively process images, audio, or video.
How is MiMo-V2.5-Pro priced on LLM.API?

LLM.API bills MiMo-V2.5-Pro per 1,000 tokens of input and output, with exact rates listed on the LLM.API pricing page.
What latency can I expect from MiMo-V2.5-Pro on LLM.API?

Typical end-to-end latency is around 500–1500 ms for short prompts, depending on request size, region, and current load.
How do I call MiMo-V2.5-Pro through the LLM.API?

Specify provider "Xiaomi" and model "MiMo-V2.5-Pro" in your LLM.API request along with your API key and usual completion parameters.
How does MiMo-V2.5-Pro compare to similar Xiaomi models?

MiMo-V2.5-Pro offers stronger reasoning and coding capabilities than earlier MiMo versions, at slightly higher cost and similar latency.
What limitations does MiMo-V2.5-Pro have?

MiMo-V2.5-Pro may hallucinate facts, lacks real-time internet access, and should not be used as the sole source for high-stakes decisions.
Can I use MiMo-V2.5-Pro for streaming responses?

Yes, MiMo-V2.5-Pro supports token streaming via LLM.API when you enable streaming in the request options.

Start in 2 lines of code

Get My API Key

MiMo-V2.5-Pro

What is MiMo-V2.5-Pro?

5 Core Capabilities

Advanced Text Generation

Deep Reasoning

Agent Tool Calling

Search-Augmented Answers

Multilingual Support

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Optimization

Automatic Failover

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code