MiMo-V2-Pro

Vision-Language

MiMo-V2-Pro is Xiaomi’s flagship trillion-parameter foundation model optimized for long-context, agentic workloads with a 1M-token context window. It is positioned as a competitive frontier-scale system for complex planning, coding, and workflow automation.

Start Using API

API Performance

Latency: ~0.8s time to first token
Context: 1M token context
Input: $1.00 / $2.00 per 1M tokens (≤256K / 256K–1M)
Output: $3.00 / $6.00 per 1M tokens (≤256K / 256K–1M)
Uptime: 99% 99%

About the model

What is MiMo-V2-Pro?

MiMo-V2-Pro is a proprietary large language model from Xiaomi with over 1T total parameters and a 1M-token context length, designed as the company’s flagship foundation model for real-world agentic scenarios. It is mainly used for complex multi-step agent tasks such as workflow orchestration, long-horizon planning, and tool use across large contexts. It is also applied to advanced software engineering assistance and coding, where internal evaluations report performance approaching top frontier models on code intelligence benchmarks. MiMo-V2-Pro belongs to Xiaomi’s MiMo-V2 family of models and succeeds earlier systems like MiMo-V2-Flash within the broader MiMo AI platform.

Input / Output

Input

Text prompts (natural language or code)

Output

Structured or free-form text responses
Generated or transformed source code

Model capabilities

5 Core Capabilities

Conversational Assistant

Generates high-quality natural language responses, explanations, and content across domains using a trillion-parameter, long-context language backbone.
Agentic Reasoning

Performs multi-step planning, tool use, and complex task execution in agent workflows, optimized for real-world autonomous operations.
Long-Context Handling

Processes and reasons over up to one million tokens of input, enabling understanding of large documents, codebases, and extended histories.
Coding and Tooling

Supports advanced programming assistance, including system design, code generation, debugging, and integration with external tools and APIs.
Multilingual Understanding

Understands and generates multiple languages, enabling cross-lingual assistance and workflow integration in global, multilingual environments.

Use cases

6 Most Valuable Use Cases

Complex Workflow Agents
Software Coding Assistant
Long-Context Research
Business Process Automation
Tool-Use Orchestration
Smart Device Integration

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for MiMo-V2-Pro–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~110ms	~750 img/min	~99.99%	~$0.40/1K images	~$0.00	~25MB image payload
Xiaomi	~Asia Pacific	~180ms	~450 img/min	~99.9%	~$0.60/1K images	~$0.00	~20MB image payload
Alibaba Cloud	~Asia Pacific	~190ms	~400 img/min	~99.9%	~$0.70/1K images	~$0.00	~16MB image payload
AWS Marketplace	~US East	~210ms	~350 img/min	~99.9%	~$0.80/1K images	~$0.00	~16MB image payload
Azure AI Studio	~EU West	~220ms	~320 img/min	~99.9%	~$0.85/1K images	~$0.00	~20MB image payload

Performance benchmarks

Technical Specifications

Metric	MiMo-V2-Pro (Xiaomi)	Xiaomi MiMo-V1	Huawei PanGu-Σ
Avg Latency	~180ms	~220ms	~240ms
Context Window	64K	32K	32K
Input Price ($/1M tokens)	$0.60	$0.70	$0.80
Output Price ($/1M tokens)	$0.90	$1.00	$1.20
Max Output Tokens	4K	4K	4K
Throughput	120 tps	100 tps	90 tps
Uptime	99.9%	99.9%	99.8%

30-day usage via LLM API

3.8B: Prompt tokens processed (last 30 days)
14.5M: API requests served (last 30 days)
4.6B: Completion tokens generated (last 30 days)
99.8%: Average API uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route requests across providers and models based on latency, cost, or accuracy. One API, pluggable strategies, no client rewrites.
One endpoint, every model
Cost-Aware Control

Mix premium and budget models with hard caps and per-project policies. Optimize spend automatically without sacrificing SLAs or developer velocity.
Ship fast, spend less
Resilient Fallbacks

Define automatic failover chains when a provider degrades or times out. Keep production traffic flowing without manual incident playbooks or redeploys.
Zero-downtime AI traffic
Deep Observability

Trace every call across providers with metrics, logs, and structured payloads. Debug prompts, compare models, and tune routing from a single pane.
See every token
Task-Level Orchestration

Describe tasks, not endpoints. LLM.API selects and chains the right tools and models so you can focus on business logic instead of glue code.
APIs that think in tasks
High-Throughput Batch

Submit massive offline or backfill jobs through a single batch API with automatic chunking, retries, and aggregation tuned for each provider’s limits.
Millions of calls, one job

Decision guide

When to Use — When NOT to Use

Use it if...

You need a cost-effective general-purpose vision-language model for mobile-centric applications.
You need tight integration with Xiaomi devices, sensors, or on-device AI capabilities.
Your use case involves multimodal queries combining photos, screenshots, and short text prompts.
Your use case involves consumer-facing features like smart albums, AR guidance, or camera assistance.
You need an on-device or edge-deployable model to reduce cloud inference costs.
Your use case involves recognizing everyday objects, scenes, and UI elements in smartphone images.
You need a model tuned for Asian consumer scenarios, interfaces, and localized content.

Avoid if...

You need state-of-the-art long-context reasoning across large codebases or lengthy technical documents.
Your workload requires highly specialized medical, legal, or financial domain expertise and certifications.
You need guaranteed multi-cloud neutrality without dependence on a specific hardware ecosystem.
Your workload requires ultra-high accuracy for safety-critical decisions in autonomous or industrial systems.
You need extensive ecosystem tooling, plugins, and mature third-party integrations available today.
Your workload requires training or fine-tuning on massive proprietary datasets entirely in-house.
You need fully documented, widely benchmarked performance comparable to leading frontier foundation models.

FAQ

Frequently Asked Questions

What is MiMo-V2-Pro?

MiMo-V2-Pro is a Xiaomi large language model available via LLM.API, designed for general-purpose text generation and assistant-style interactions.
What modalities does MiMo-V2-Pro support?

MiMo-V2-Pro supports text-in, text-out interactions only when accessed through LLM.API.
How is MiMo-V2-Pro priced on LLM.API?

MiMo-V2-Pro usage on LLM.API is priced per input and output token, with exact rates defined in your LLM.API pricing plan.
What is the context window of MiMo-V2-Pro on LLM.API?

MiMo-V2-Pro supports up to a 16K token context window per request on LLM.API.
How fast is MiMo-V2-Pro in terms of latency?

MiMo-V2-Pro typically returns first tokens within a few hundred milliseconds, with total latency depending on prompt size and output length.
How do I call MiMo-V2-Pro through the LLM.API?

Select the Xiaomi provider and specify the MiMo-V2-Pro model name in your LLM.API request parameters, then send standard chat completion requests.
What is MiMo-V2-Pro best suited for?

MiMo-V2-Pro is best for chatbots, content generation, and general reasoning tasks where cost-effectiveness and stable performance are important.
How does MiMo-V2-Pro compare to similar models on LLM.API?

MiMo-V2-Pro generally trades slightly lower peak capability for more predictable costs and latency compared with top-tier frontier models.
What limitations should I be aware of when using MiMo-V2-Pro?

MiMo-V2-Pro can hallucinate facts, lacks real-time knowledge, and should not be used without verification for safety-critical or highly specialized domains.
Can MiMo-V2-Pro handle structured tool calls or function calling?

MiMo-V2-Pro supports tool-style outputs when you design appropriate JSON schemas and prompting, but it has no built-in tool execution.

Start in 2 lines of code

Get My API Key

MiMo-V2-Pro

What is MiMo-V2-Pro?

5 Core Capabilities

Conversational Assistant

Agentic Reasoning

Long-Context Handling

Coding and Tooling

Multilingual Understanding

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Control

Resilient Fallbacks

Deep Observability

Task-Level Orchestration

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code