MiniMax M2-her

Instruction Following

MiniMax M2-her is a dialogue-first large language model from MiniMax, optimized for immersive roleplay, character-driven chat, and expressive multi-turn conversations.

Start Using API

API Performance

Latency: ~1.4s time to first token
Context: ~200K token context
Input: ~$0.30 per 1M tokens
Output: ~$1.20 per 1M tokens
Uptime: 99% 99%

About the model

What is MiniMax M2-her?

MiniMax M2-her is a specialized variant of the MiniMax M2 large language model designed primarily for dialogue and character-focused interaction. It is mainly used for immersive roleplay, storytelling, and companion-style chat experiences that require strong character consistency and emotional nuance. It also serves creative writing, interactive fiction, and other conversational applications where maintaining long, coherent, multi-turn dialogues is important. M2-her belongs to the MiniMax M2 model family, which includes other general-purpose and high-speed variants such as M2, M2.1, and M2.5.

Input / Output

Input

Text prompts (natural language, dialogue-style input)

Output

Text responses (conversational and roleplay-style output)

Model capabilities

5 Core Capabilities

Conversational Chat

Supports multi-turn natural language dialogues, following instructions and maintaining context for general-purpose assistant and chatbot applications.
Text Monitoring

Enables content analysis and safety monitoring, helping classify or filter user-generated text based on policies or business rules.
Image Capabilities

Processes input images to assist with multimodal tasks like visual context understanding, when integrated into supported MiniMax products.
Text Translation

Translates between multiple languages for everyday communication and application localization within MiniMax’s supported language pairs.
Optical Character Recognition

Extracts machine-readable text from images or screenshots when paired with MiniMax tooling that exposes OCR functionality.

Use cases

6 Most Valuable Use Cases

Immersive Roleplay Chat
Story Co-writing
AI Companion Chatbots
Interactive Fiction Games
Language Practice Partner
Emotional Dialogue Testing

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for MiniMax M2-her–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~120 tps	~99.99%	~$0.20	~$0.20	~128K tokens
MiniMax	Global	~220ms	~60 tps	~99.9%	~$0.70	~$0.70	~64K tokens
OpenAI (GPT-4.1 Mini-equivalent)	Global	~250ms	~80 tps	~99.9%	~$0.50	~$1.50	~128K tokens
Anthropic (Claude 3 Haiku-equivalent)	US/EU	~260ms	~70 tps	~99.9%	~$0.40	~$1.20	~200K tokens
Azure AI (MiniMax-compatible deployment)	US East	~240ms	~65 tps	~99.9%	~$0.60	~$0.60	~64K tokens

Performance benchmarks

Technical Specifications

Metric	MiniMax M2-her	OpenAI GPT-4o Mini	Anthropic Claude 3 Haiku
Avg Latency	~250ms	~230ms	~280ms
Context Window	128K	128K	200K
Input Price ($/1M)	~$0.40	$0.15	$0.25
Output Price ($/1M)	~$0.60	$0.60	$1.25
Max Output Tokens	4K	4K	4K
Throughput	~120 tps	~150 tps	~110 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

6.8B: Prompt tokens processed (last 30 days)
2.4B: Completion tokens generated (last 30 days)
11.5M: API requests served (last 30 days)
99.8%: Average API uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the best model across providers based on latency, cost, and quality—without changing your integration or redeploying code.
Smart routing, one API
Cost-Aware Orchestration

Automatically balance premium and budget models using configurable policies, so you control spend while keeping performance and user experience predictable at scale.
Optimize spend, not code
Resilient Fallback Flows

Define provider- and model-level fallback chains that trigger on errors, timeouts, or bad responses—keeping your production workloads online when vendors fail.
Never ship single-vendor
Deep Observability

Get centralized traces, metrics, and logs across all LLM calls—latency, errors, cost, and model behavior—so you can debug, tune, and prove reliability to stakeholders.
See every token, everywhere
Task-Level Abstractions

Describe tasks like chat, generation, tools, or structured output once and let LLM.API map them to the right provider-specific APIs under the hood.
Think tasks, not vendors
High-Throughput Batching

Submit large batches of prompts in a single request with concurrency control and vendor-aware limits, cutting latency and API costs for bulk workloads.
Scale up, pay less

Decision guide

When to Use — When NOT to Use

Use it if...

You need a general-purpose Chinese-centric model from a major Chinese AI provider.
You need cost-effective text generation for chatbots serving primarily Chinese-speaking users.
Your use case involves everyday assistant tasks like Q&A, drafting, and rewriting text.
Your use case involves integrating a mainstream Chinese LLM into an existing local tech stack.
You need moderate reasoning and conversation quality without requiring state-of-the-art performance.
You need to experiment with a variety of MiniMax-hosted models within one ecosystem.

Avoid if...

You need cutting-edge reasoning performance comparable to the very best frontier models available.
Your workload requires strong support, optimization, and documentation primarily in English environments.
You need guaranteed world-class performance on advanced coding, math, or complex scientific tasks.
Your workload requires strict US or EU compliance guarantees and detailed public security attestations.
You need highly specialized domain models, such as finance-grade or medical-grade tuned systems.
Your workload requires seamless interoperability with leading Western foundation-model platforms and tooling.

FAQ

Frequently Asked Questions

What is MiniMax M2-her?

MiniMax M2-her is a large language model from MiniMax designed for fast, cost-efficient text generation and reasoning via the LLM.API gateway.
What is the context window of MiniMax M2-her?

MiniMax M2-her supports multi-thousand token prompts, suitable for moderately long conversations, documents, and tool-augmented workflows through LLM.API.
What modalities does MiniMax M2-her support via LLM.API?

Via LLM.API, MiniMax M2-her currently operates as a text-in, text-out model for chat, completion, and tool-calling style interactions.
How does MiniMax M2-her pricing work on LLM.API?

MiniMax M2-her is billed per 1,000 input and output tokens according to LLM.API’s MiniMax-specific pricing, visible in your LLM.API dashboard.
What latency should I expect from MiniMax M2-her on LLM.API?

Typical end-to-end latency is on the order of a few hundred milliseconds to a couple of seconds, depending on prompt size and load.
How do I call MiniMax M2-her through LLM.API?

You select the MiniMax M2-her model name in the LLM.API completions or chat endpoint and authenticate using your LLM.API key.
What is MiniMax M2-her particularly good at?

MiniMax M2-her is well-suited for general chatbots, drafting, rewriting, basic code assistance, and domain-specific reasoning when provided with clear instructions.
How does MiniMax M2-her compare to similar models?

MiniMax M2-her typically offers a balance of speed and quality comparable to mid-sized general-purpose LLMs, often at a lower per-token cost.
What are the main limitations of MiniMax M2-her?

MiniMax M2-her can hallucinate facts, lacks guaranteed up-to-date knowledge, and may struggle with very long-context reasoning or highly specialized expert domains.
Can I use MiniMax M2-her for streaming responses?

Yes, when enabled in LLM.API, MiniMax M2-her can stream partial tokens to reduce perceived latency in interactive applications.

Start in 2 lines of code

Get My API Key

MiniMax M2-her

What is MiniMax M2-her?

5 Core Capabilities

Conversational Chat

Text Monitoring

Image Capabilities

Text Translation

Optical Character Recognition

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

Deep Observability

Task-Level Abstractions

High-Throughput Batching

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code