Powered by MiniMax
MiniMax M2-her
- Instruction Following
MiniMax M2-her is a dialogue-first large language model from MiniMax, optimized for immersive roleplay, character-driven chat, and expressive multi-turn conversations.
About the model
What is MiniMax M2-her?
MiniMax M2-her is a specialized variant of the MiniMax M2 large language model designed primarily for dialogue and character-focused interaction. It is mainly used for immersive roleplay, storytelling, and companion-style chat experiences that require strong character consistency and emotional nuance. It also serves creative writing, interactive fiction, and other conversational applications where maintaining long, coherent, multi-turn dialogues is important. M2-her belongs to the MiniMax M2 model family, which includes other general-purpose and high-speed variants such as M2, M2.1, and M2.5.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Supports multi-turn natural language dialogues, following instructions and maintaining context for general-purpose assistant and chatbot applications.
-
Text Monitoring
Enables content analysis and safety monitoring, helping classify or filter user-generated text based on policies or business rules.
-
Image Capabilities
Processes input images to assist with multimodal tasks like visual context understanding, when integrated into supported MiniMax products.
-
Text Translation
Translates between multiple languages for everyday communication and application localization within MiniMax’s supported language pairs.
-
Optical Character Recognition
Extracts machine-readable text from images or screenshots when paired with MiniMax tooling that exposes OCR functionality.
Use cases
6 Most Valuable Use Cases
- Immersive Roleplay Chat
- Story Co-writing
- AI Companion Chatbots
- Interactive Fiction Games
- Language Practice Partner
- Emotional Dialogue Testing
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and latency for MiniMax M2-her–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~120 tps | ~99.99% | ~$0.20 | ~$0.20 | ~128K tokens |
| MiniMax | Global | ~220ms | ~60 tps | ~99.9% | ~$0.70 | ~$0.70 | ~64K tokens |
| OpenAI (GPT-4.1 Mini-equivalent) | Global | ~250ms | ~80 tps | ~99.9% | ~$0.50 | ~$1.50 | ~128K tokens |
| Anthropic (Claude 3 Haiku-equivalent) | US/EU | ~260ms | ~70 tps | ~99.9% | ~$0.40 | ~$1.20 | ~200K tokens |
| Azure AI (MiniMax-compatible deployment) | US East | ~240ms | ~65 tps | ~99.9% | ~$0.60 | ~$0.60 | ~64K tokens |
Performance benchmarks
Technical Specifications
| Metric | MiniMax M2-her | OpenAI GPT-4o Mini | Anthropic Claude 3 Haiku |
|---|---|---|---|
| Avg Latency | ~250ms | ~230ms | ~280ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | ~$0.40 | $0.15 | $0.25 |
| Output Price ($/1M) | ~$0.60 | $0.60 | $1.25 |
| Max Output Tokens | 4K | 4K | 4K |
| Throughput | ~120 tps | ~150 tps | ~110 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 6.8B
- Prompt tokens processed (last 30 days)
- 2.4B
- Completion tokens generated (last 30 days)
- 11.5M
- API requests served (last 30 days)
- 99.8%
- Average API uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the best model across providers based on latency, cost, and quality—without changing your integration or redeploying code.
Smart routing, one API -
Cost-Aware Orchestration
Automatically balance premium and budget models using configurable policies, so you control spend while keeping performance and user experience predictable at scale.
Optimize spend, not code -
Resilient Fallback Flows
Define provider- and model-level fallback chains that trigger on errors, timeouts, or bad responses—keeping your production workloads online when vendors fail.
Never ship single-vendor -
Deep Observability
Get centralized traces, metrics, and logs across all LLM calls—latency, errors, cost, and model behavior—so you can debug, tune, and prove reliability to stakeholders.
See every token, everywhere -
Task-Level Abstractions
Describe tasks like chat, generation, tools, or structured output once and let LLM.API map them to the right provider-specific APIs under the hood.
Think tasks, not vendors -
High-Throughput Batching
Submit large batches of prompts in a single request with concurrency control and vendor-aware limits, cutting latency and API costs for bulk workloads.
Scale up, pay less
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a general-purpose Chinese-centric model from a major Chinese AI provider.
- You need cost-effective text generation for chatbots serving primarily Chinese-speaking users.
- Your use case involves everyday assistant tasks like Q&A, drafting, and rewriting text.
- Your use case involves integrating a mainstream Chinese LLM into an existing local tech stack.
- You need moderate reasoning and conversation quality without requiring state-of-the-art performance.
- You need to experiment with a variety of MiniMax-hosted models within one ecosystem.
Avoid if...
- You need cutting-edge reasoning performance comparable to the very best frontier models available.
- Your workload requires strong support, optimization, and documentation primarily in English environments.
- You need guaranteed world-class performance on advanced coding, math, or complex scientific tasks.
- Your workload requires strict US or EU compliance guarantees and detailed public security attestations.
- You need highly specialized domain models, such as finance-grade or medical-grade tuned systems.
- Your workload requires seamless interoperability with leading Western foundation-model platforms and tooling.
FAQ
Frequently Asked Questions
-
What is MiniMax M2-her?
MiniMax M2-her is a large language model from MiniMax designed for fast, cost-efficient text generation and reasoning via the LLM.API gateway.
-
What is the context window of MiniMax M2-her?
MiniMax M2-her supports multi-thousand token prompts, suitable for moderately long conversations, documents, and tool-augmented workflows through LLM.API.
-
What modalities does MiniMax M2-her support via LLM.API?
Via LLM.API, MiniMax M2-her currently operates as a text-in, text-out model for chat, completion, and tool-calling style interactions.
-
How does MiniMax M2-her pricing work on LLM.API?
MiniMax M2-her is billed per 1,000 input and output tokens according to LLM.API’s MiniMax-specific pricing, visible in your LLM.API dashboard.
-
What latency should I expect from MiniMax M2-her on LLM.API?
Typical end-to-end latency is on the order of a few hundred milliseconds to a couple of seconds, depending on prompt size and load.
-
How do I call MiniMax M2-her through LLM.API?
You select the MiniMax M2-her model name in the LLM.API completions or chat endpoint and authenticate using your LLM.API key.
-
What is MiniMax M2-her particularly good at?
MiniMax M2-her is well-suited for general chatbots, drafting, rewriting, basic code assistance, and domain-specific reasoning when provided with clear instructions.
-
How does MiniMax M2-her compare to similar models?
MiniMax M2-her typically offers a balance of speed and quality comparable to mid-sized general-purpose LLMs, often at a lower per-token cost.
-
What are the main limitations of MiniMax M2-her?
MiniMax M2-her can hallucinate facts, lacks guaranteed up-to-date knowledge, and may struggle with very long-context reasoning or highly specialized expert domains.
-
Can I use MiniMax M2-her for streaming responses?
Yes, when enabled in LLM.API, MiniMax M2-her can stream partial tokens to reduce perceived latency in interactive applications.
