MiniMax M2 is a large language model by MiniMax focused on efficient, general-purpose text generation and understanding for applications like chatbots and content tools.

What is the context window of MiniMax M2?

MiniMax M2 supports a context window of up to 32K tokens via LLM.API, suitable for longer conversations and multi-document prompts.

Which modalities does MiniMax M2 support through LLM.API?

MiniMax M2 currently supports text input and text output only when accessed via LLM.API.

How fast is MiniMax M2 when called through LLM.API?

MiniMax M2 typically returns first tokens within a few hundred milliseconds to a couple of seconds, depending on prompt length and load.

How is MiniMax M2 priced on LLM.API?

MiniMax M2 usage on LLM.API is billed per 1,000 input and output tokens, with exact rates shown in your LLM.API pricing dashboard.

How do I call MiniMax M2 via the LLM.API?

You select the MiniMax M2 model ID in your LLM.API request and send standard Chat or Completion-style JSON with messages and parameters.

What is MiniMax M2 best suited for?

MiniMax M2 is best for cost-efficient conversational agents, drafting and editing text, and general reasoning where ultra-high-end reasoning is not mandatory.

How does MiniMax M2 compare to similar models on LLM.API?

MiniMax M2 generally offers a good balance of quality and cost, competing with mid-tier models while being cheaper than many frontier models.

What are the main limitations of MiniMax M2?

MiniMax M2 can hallucinate facts, lacks real-time knowledge, and may underperform top-tier frontier models on complex reasoning or highly specialized domains.

Can I fine-tune MiniMax M2 through LLM.API?

MiniMax M2 is currently available only as a hosted base model on LLM.API, without user-managed fine-tuning.

MiniMax M2

Text Generation

MiniMax M2 is an open‑weight Mixture‑of‑Experts large language model from MiniMax, designed to deliver high coding and agentic workflow performance with low latency and cost. It uses 230B total parameters with only about 10B active per token to balance strong reasoning with efficient deployment.

Start Using API

API Performance

Latency: 1.39s time to first token
Context: 205K token context
Input: ~$0.20 per 1M tokens
Output: ~$1.00 per 1M tokens
Uptime: 99% 99%

About the model

What is MiniMax M2?

MiniMax M2 is an open‑weight, MoE-based large language model by MiniMax optimized for coding and autonomous agent workflows. It is mainly used for software development tasks such as code generation, refactoring, and debugging, as well as orchestrating multi-step agentic workflows that call tools and APIs efficiently. It also serves as a general-purpose LLM for chat, reasoning, and integration into developer tools and AI platforms. MiniMax M2 belongs to the MiniMax-M2 family of Mixture-of-Experts models and follows earlier MiniMax research lines such as the MiniMax-M1 reasoning models.

Input / Output

Input

Text prompts

Output

Text completions and chat responses
Source code generation

Model capabilities

5 Core Capabilities

Advanced Coding

Optimized for code generation, debugging, multi-file editing, and compile-run-fix loops in modern software engineering workflows.
Agentic Workflows

Designed for tool use and agentic reasoning, enabling plan-act-verify loops and complex multi-step task automation.
Long-Context Reasoning

Handles very long inputs with strong reasoning performance across benchmarks, suitable for large documents and complex problems.
Multilingual Support

Provides strong multilingual language understanding and generation, covering multiple major languages with high-quality outputs.
Handwriting OCR

Exhibits outstanding optical character recognition on handwritten text, outperforming many contemporary AI models in accuracy tests.

Use cases

6 Most Valuable Use Cases

Agentic Workflows Automation
Advanced Code Generation
Multi-step Task Planning
Developer IDE Assistant
Enterprise Productivity Bots
Tool-using AI Agents

Transparent pricing

Cost Comparison

LLM API offers the lowest MiniMax‑class pricing and fastest response times versus other providers.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	80 tps	99.99%	$0.30	$0.60	128K
MiniMax	Asia Pacific	~220ms	~40 tps	99.9%	~$0.40	~$0.80	~64K
OpenAI (closest: GPT‑4o‑mini)	Global	~180ms	~50 tps	99.9%	~$0.50	~$1.00	128K
Anthropic (closest: Claude 3 Haiku)	US East	~190ms	~45 tps	99.9%	~$0.55	~$1.10	200K
Google (closest: Gemini 1.5 Flash)	Global	~200ms	~45 tps	99.9%	~$0.45	~$0.90	1M

Performance benchmarks

Technical Specifications

Metric	MiniMax M2	OpenAI GPT-4.1 Mini	Anthropic Claude 3 Haiku
Avg Latency	~220ms	~180ms	~200ms
Context Window	~128K	128K	200K
Input Price ($/1M)	~$0.15	$0.15	$0.25
Output Price ($/1M)	~$0.60	$0.60	$1.25
Max Output Tokens	~4K	4K	4K
Throughput	~45 tps	~50 tps	~40 tps
Uptime	~99.9%	~99.9%	~99.9%

30-day usage via LLM API

7.8B: Prompt tokens processed (30 days)
640M: Completion tokens generated (30 days)
12.5M: API requests served (30 days)
99.8%: Avg uptime over last 30 days

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the optimal model across providers based on latency, cost, and quality—no client changes or custom glue code required.
One endpoint, every model
Cost-Aware Optimization

Control spend with price-aware routing, model selection, and usage policies so you can ship AI features fast without surprise bills or manual tuning.
Lower cost, same quality
Resilient Fallbacks

Define automatic fallbacks to alternative models and providers when requests fail or degrade, keeping your AI features reliable even during provider outages.
Stay online, automatically
End-to-End Observability

Get unified logs, traces, and metrics across every provider, model, and endpoint so you can debug, optimize prompts, and monitor performance from a single place.
See every token
Task-Level Abstractions

Call high-level tasks like chat, generate, extract, or embed instead of vendor-specific APIs, giving you portable, maintainable code that outlives any single model.
Code to tasks, not vendors
High-Throughput Batching

Process thousands of calls efficiently with smart batching and concurrency controls, maximizing throughput while staying within provider limits and budget.
Scale to millions of calls

Decision guide

When to Use — When NOT to Use

Use it if...

You need a cost-efficient, high-performance coding model optimized for agentic workflows.
You need long-context processing for large codebases or multi-file software projects.
Your use case involves building tool-using AI agents with frequent plan–act–verify loops.
Your use case involves deploying an open-weight model locally under a permissive license.
You need strong reasoning and coding benchmarks without paying frontier closed-model prices.
Your use case involves integrating with cloud providers like Amazon Bedrock for managed hosting.

Avoid if...

You need state-of-the-art general chat quality over pure coding and agentic performance.
You need mature, deeply integrated ecosystem support comparable to OpenAI or Anthropic models.
Your workload requires highly specialized vision, audio, or video generation beyond text-centric use.
You need battle-tested enterprise compliance certifications and governance from long-established vendors.
Your workload requires ultra-low latency at extreme global scale with many regional datacenters.
You need maximum benchmark performance and features from the latest MiniMax M-series successors.

FAQ

Frequently Asked Questions

What is MiniMax M2?

MiniMax M2 is a large language model by MiniMax focused on efficient, general-purpose text generation and understanding for applications like chatbots and content tools.
What is the context window of MiniMax M2?

MiniMax M2 supports a context window of up to 32K tokens via LLM.API, suitable for longer conversations and multi-document prompts.
Which modalities does MiniMax M2 support through LLM.API?

MiniMax M2 currently supports text input and text output only when accessed via LLM.API.
How fast is MiniMax M2 when called through LLM.API?

MiniMax M2 typically returns first tokens within a few hundred milliseconds to a couple of seconds, depending on prompt length and load.
How is MiniMax M2 priced on LLM.API?

MiniMax M2 usage on LLM.API is billed per 1,000 input and output tokens, with exact rates shown in your LLM.API pricing dashboard.
How do I call MiniMax M2 via the LLM.API?

You select the MiniMax M2 model ID in your LLM.API request and send standard Chat or Completion-style JSON with messages and parameters.
What is MiniMax M2 best suited for?

MiniMax M2 is best for cost-efficient conversational agents, drafting and editing text, and general reasoning where ultra-high-end reasoning is not mandatory.
How does MiniMax M2 compare to similar models on LLM.API?

MiniMax M2 generally offers a good balance of quality and cost, competing with mid-tier models while being cheaper than many frontier models.
What are the main limitations of MiniMax M2?

MiniMax M2 can hallucinate facts, lacks real-time knowledge, and may underperform top-tier frontier models on complex reasoning or highly specialized domains.
Can I fine-tune MiniMax M2 through LLM.API?

MiniMax M2 is currently available only as a hosted base model on LLM.API, without user-managed fine-tuning.

Start in 2 lines of code

Get My API Key

MiniMax M2

What is MiniMax M2?

5 Core Capabilities

Advanced Coding

Agentic Workflows

Long-Context Reasoning

Multilingual Support

Handwriting OCR

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Optimization

Resilient Fallbacks

End-to-End Observability

Task-Level Abstractions

High-Throughput Batching

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code