Powered by Xiaomi

MiMo-V2.5-Pro

  • Vision-Language

MiMo-V2.5-Pro is Xiaomi’s flagship open-weight trillion-parameter omnimodal MoE language model optimized for long-context, tool-using AI agents. It is notable for its 1M-token context window and leading performance on agentic and coding benchmarks at comparatively low token cost.

Start Using API

What is MiMo-V2.5-Pro?

MiMo-V2.5-Pro is Xiaomi’s ~1T-parameter (42B active) open-weight omnimodal Mixture-of-Experts language model with a 1M-token context window, designed to power advanced AI agents. It is mainly used for complex software engineering and autonomous coding workflows, where it can run hours-long sessions involving code generation, debugging, and project management. It is also used for general-purpose agentic tasks such as tool use, long-horizon planning, and multi-step reasoning over very long documents and contexts. MiMo-V2.5-Pro belongs to Xiaomi’s MiMo-V2.5 family of models, following earlier MiMo-V2 variants and extending them with larger scale, longer context, and improved agent performance.

5 Core Capabilities

  • Advanced Text Generation

    Generates coherent, context-aware text for complex tasks with one-million-token context and large 128K-token outputs for long documents.

  • Deep Reasoning

    Optimized Mixture-of-Experts reasoning for complex, multi-step planning, long-horizon tasks, and sophisticated agent workflows and automation.

  • Agent Tool Calling

    Supports function and tool calling for structured outputs, enabling robust AI agents that interact with external systems and APIs.

  • Search-Augmented Answers

    Integrates web search during inference to retrieve up-to-date information, improving factual accuracy and grounding of generated responses.

  • Multilingual Support

    Handles multiple languages for generation and understanding, suitable for global users across diverse linguistic environments and content.

6 Most Valuable Use Cases

  • Agentic workflows automation
  • Complex software engineering
  • Long-horizon task planning
  • Tool-using chat agents
  • Large document analysis
  • Cost-efficient AI deployment

Cost Comparison

LLM API offers the lowest cost and highest performance for MiMo-V2.5-Pro–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120 tps 99.99% $0.20 $0.20 256K
Xiaomi Asia Pacific ~150ms ~40 tps ~99.9% ~$0.50 ~$0.50 ~128K
OpenAI-compatible Gateway Global ~120ms ~70 tps ~99.95% ~$0.35 ~$0.35 ~128K
Tencent Cloud Asia Pacific ~160ms ~50 tps ~99.9% ~$0.40 ~$0.40 ~64K
Huawei Cloud China ~170ms ~45 tps ~99.9% ~$0.45 ~$0.45 ~64K

Technical Specifications

Metric MiMo-V2.5-Pro Xiaomi MiMo-V2.5 Huawei Pangu-Chat 2.0
Avg Latency ~180ms ~220ms ~250ms
Context Window 128K 64K 32K
Input Price ($/1M tokens) $0.80 $0.70 $1.00
Output Price ($/1M tokens) $1.60 $1.40 $2.00
Max Output Tokens 8K 4K 4K
Throughput 48 tps 36 tps 30 tps
Uptime 99.9% 99.5% 99.5%

30-day usage via LLM API

7.8B
Prompt tokens processed (last 30 days)
520M
Completion tokens generated (last 30 days)
24.5M
API requests served (last 30 days)
99.8%
Average API uptime
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Automatically route each request to the best model across providers based on task, latency, and reliability—no client changes or custom glue code required.

    One endpoint, every model.
  • Cost-Aware Optimization

    Continuously pick the most cost-effective models and configurations for each call, reducing AI spend without sacrificing quality or performance at scale.

    Cut costs, not quality.
  • Automatic Failover

    When a provider degrades or fails, requests transparently fail over to healthy models, keeping your AI features online without on-call fire drills.

    Resilience by default.
  • End-to-End Observability

    Trace every request across models and providers with metrics, logs, and structured events, so you can debug, tune, and prove reliability in production.

    See every token flow.
  • Task-Level Abstractions

    Call high-level tasks like chat, generate, extract, or rank instead of vendor-specific APIs, so your business logic is decoupled from underlying model churn.

    Code to tasks, not vendors.
  • High-Throughput Batch

    Run massive offline or async workloads through a single batch pipeline with retries, chunking, and parallelization handled for you by the platform.

    Ship millions of calls.

When to Use — When NOT to Use

Use it if...

  • You need an on-device assistant optimized for Xiaomi phones and IoT hardware.
  • Your use case involves integrating AI features tightly with Xiaomi system apps and services.
  • You need decent multimodal understanding of photos and screenshots from Xiaomi devices.
  • Your use case involves Chinese-language queries and Xiaomi ecosystem–specific knowledge or content.
  • You need a vendor-supported model aligned with Xiaomi’s privacy and data-handling policies.
  • Your use case involves AI features preinstalled on Xiaomi phones without complex external infrastructure.
  • You need voice or camera–driven assistance tuned for Xiaomi hardware capabilities and sensors.

Avoid if...

  • You need state-of-the-art reasoning performance comparable to top-tier frontier foundation models.
  • Your workload requires broad third-party ecosystem support, tooling, and community integrations.
  • You need guaranteed cross-vendor portability across diverse cloud platforms and non-Xiaomi devices.
  • Your workload requires extensively documented APIs, SDKs, and English-first developer resources.
  • You need transparent, independently benchmarked performance for regulated or safety-critical applications.
  • Your workload requires fine-tuning hooks and flexible model customization exposed to external developers.
  • You need mature enterprise features like granular governance, auditing, and standardized compliance reports.

Frequently Asked Questions

  • What is MiMo-V2.5-Pro?

    MiMo-V2.5-Pro is a Xiaomi large language model available through LLM.API, optimized for general-purpose text generation and assistant-style interactions.

  • What is MiMo-V2.5-Pro best suited for?

    MiMo-V2.5-Pro is best for chatbots, content generation, lightweight reasoning, and common coding or data-processing tasks where cost-efficiency matters.

  • What is the context window of MiMo-V2.5-Pro?

    MiMo-V2.5-Pro supports a 16K-token context window, allowing relatively long conversations and documents within a single request.

  • What modalities does MiMo-V2.5-Pro support?

    MiMo-V2.5-Pro supports text-in, text-out interactions only; it does not natively process images, audio, or video.

  • How is MiMo-V2.5-Pro priced on LLM.API?

    LLM.API bills MiMo-V2.5-Pro per 1,000 tokens of input and output, with exact rates listed on the LLM.API pricing page.

  • What latency can I expect from MiMo-V2.5-Pro on LLM.API?

    Typical end-to-end latency is around 500–1500 ms for short prompts, depending on request size, region, and current load.

  • How do I call MiMo-V2.5-Pro through the LLM.API?

    Specify provider "Xiaomi" and model "MiMo-V2.5-Pro" in your LLM.API request along with your API key and usual completion parameters.

  • How does MiMo-V2.5-Pro compare to similar Xiaomi models?

    MiMo-V2.5-Pro offers stronger reasoning and coding capabilities than earlier MiMo versions, at slightly higher cost and similar latency.

  • What limitations does MiMo-V2.5-Pro have?

    MiMo-V2.5-Pro may hallucinate facts, lacks real-time internet access, and should not be used as the sole source for high-stakes decisions.

  • Can I use MiMo-V2.5-Pro for streaming responses?

    Yes, MiMo-V2.5-Pro supports token streaming via LLM.API when you enable streaming in the request options.

Start in 2 lines of code

Get My API Key