Powered by MiniMax

MiniMax M2-her

  • Instruction Following

MiniMax M2-her is a dialogue-first large language model from MiniMax, optimized for immersive roleplay, character-driven chat, and expressive multi-turn conversations.

Start Using API

What is MiniMax M2-her?

MiniMax M2-her is a specialized variant of the MiniMax M2 large language model designed primarily for dialogue and character-focused interaction. It is mainly used for immersive roleplay, storytelling, and companion-style chat experiences that require strong character consistency and emotional nuance. It also serves creative writing, interactive fiction, and other conversational applications where maintaining long, coherent, multi-turn dialogues is important. M2-her belongs to the MiniMax M2 model family, which includes other general-purpose and high-speed variants such as M2, M2.1, and M2.5.

5 Core Capabilities

  • Conversational Chat

    Supports multi-turn natural language dialogues, following instructions and maintaining context for general-purpose assistant and chatbot applications.

  • Text Monitoring

    Enables content analysis and safety monitoring, helping classify or filter user-generated text based on policies or business rules.

  • Image Capabilities

    Processes input images to assist with multimodal tasks like visual context understanding, when integrated into supported MiniMax products.

  • Text Translation

    Translates between multiple languages for everyday communication and application localization within MiniMax’s supported language pairs.

  • Optical Character Recognition

    Extracts machine-readable text from images or screenshots when paired with MiniMax tooling that exposes OCR functionality.

6 Most Valuable Use Cases

  • Immersive Roleplay Chat
  • Story Co-writing
  • AI Companion Chatbots
  • Interactive Fiction Games
  • Language Practice Partner
  • Emotional Dialogue Testing

Cost Comparison

LLM API offers the lowest cost and latency for MiniMax M2-her–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~120ms ~120 tps ~99.99% ~$0.20 ~$0.20 ~128K tokens
MiniMax Global ~220ms ~60 tps ~99.9% ~$0.70 ~$0.70 ~64K tokens
OpenAI (GPT-4.1 Mini-equivalent) Global ~250ms ~80 tps ~99.9% ~$0.50 ~$1.50 ~128K tokens
Anthropic (Claude 3 Haiku-equivalent) US/EU ~260ms ~70 tps ~99.9% ~$0.40 ~$1.20 ~200K tokens
Azure AI (MiniMax-compatible deployment) US East ~240ms ~65 tps ~99.9% ~$0.60 ~$0.60 ~64K tokens

Technical Specifications

Metric MiniMax M2-her OpenAI GPT-4o Mini Anthropic Claude 3 Haiku
Avg Latency ~250ms ~230ms ~280ms
Context Window 128K 128K 200K
Input Price ($/1M) ~$0.40 $0.15 $0.25
Output Price ($/1M) ~$0.60 $0.60 $1.25
Max Output Tokens 4K 4K 4K
Throughput ~120 tps ~150 tps ~110 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

6.8B
Prompt tokens processed (last 30 days)
2.4B
Completion tokens generated (last 30 days)
11.5M
API requests served (last 30 days)
99.8%
Average API uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Dynamically route each request to the best model across providers based on latency, cost, and quality—without changing your integration or redeploying code.

    Smart routing, one API
  • Cost-Aware Orchestration

    Automatically balance premium and budget models using configurable policies, so you control spend while keeping performance and user experience predictable at scale.

    Optimize spend, not code
  • Resilient Fallback Flows

    Define provider- and model-level fallback chains that trigger on errors, timeouts, or bad responses—keeping your production workloads online when vendors fail.

    Never ship single-vendor
  • Deep Observability

    Get centralized traces, metrics, and logs across all LLM calls—latency, errors, cost, and model behavior—so you can debug, tune, and prove reliability to stakeholders.

    See every token, everywhere
  • Task-Level Abstractions

    Describe tasks like chat, generation, tools, or structured output once and let LLM.API map them to the right provider-specific APIs under the hood.

    Think tasks, not vendors
  • High-Throughput Batching

    Submit large batches of prompts in a single request with concurrency control and vendor-aware limits, cutting latency and API costs for bulk workloads.

    Scale up, pay less

When to Use — When NOT to Use

Use it if...

  • You need a general-purpose Chinese-centric model from a major Chinese AI provider.
  • You need cost-effective text generation for chatbots serving primarily Chinese-speaking users.
  • Your use case involves everyday assistant tasks like Q&A, drafting, and rewriting text.
  • Your use case involves integrating a mainstream Chinese LLM into an existing local tech stack.
  • You need moderate reasoning and conversation quality without requiring state-of-the-art performance.
  • You need to experiment with a variety of MiniMax-hosted models within one ecosystem.

Avoid if...

  • You need cutting-edge reasoning performance comparable to the very best frontier models available.
  • Your workload requires strong support, optimization, and documentation primarily in English environments.
  • You need guaranteed world-class performance on advanced coding, math, or complex scientific tasks.
  • Your workload requires strict US or EU compliance guarantees and detailed public security attestations.
  • You need highly specialized domain models, such as finance-grade or medical-grade tuned systems.
  • Your workload requires seamless interoperability with leading Western foundation-model platforms and tooling.

Frequently Asked Questions

  • What is MiniMax M2-her?

    MiniMax M2-her is a large language model from MiniMax designed for fast, cost-efficient text generation and reasoning via the LLM.API gateway.

  • What is the context window of MiniMax M2-her?

    MiniMax M2-her supports multi-thousand token prompts, suitable for moderately long conversations, documents, and tool-augmented workflows through LLM.API.

  • What modalities does MiniMax M2-her support via LLM.API?

    Via LLM.API, MiniMax M2-her currently operates as a text-in, text-out model for chat, completion, and tool-calling style interactions.

  • How does MiniMax M2-her pricing work on LLM.API?

    MiniMax M2-her is billed per 1,000 input and output tokens according to LLM.API’s MiniMax-specific pricing, visible in your LLM.API dashboard.

  • What latency should I expect from MiniMax M2-her on LLM.API?

    Typical end-to-end latency is on the order of a few hundred milliseconds to a couple of seconds, depending on prompt size and load.

  • How do I call MiniMax M2-her through LLM.API?

    You select the MiniMax M2-her model name in the LLM.API completions or chat endpoint and authenticate using your LLM.API key.

  • What is MiniMax M2-her particularly good at?

    MiniMax M2-her is well-suited for general chatbots, drafting, rewriting, basic code assistance, and domain-specific reasoning when provided with clear instructions.

  • How does MiniMax M2-her compare to similar models?

    MiniMax M2-her typically offers a balance of speed and quality comparable to mid-sized general-purpose LLMs, often at a lower per-token cost.

  • What are the main limitations of MiniMax M2-her?

    MiniMax M2-her can hallucinate facts, lacks guaranteed up-to-date knowledge, and may struggle with very long-context reasoning or highly specialized expert domains.

  • Can I use MiniMax M2-her for streaming responses?

    Yes, when enabled in LLM.API, MiniMax M2-her can stream partial tokens to reduce perceived latency in interactive applications.

Start in 2 lines of code

Get My API Key