Powered by Inception

Mercury 2

  • Text Generation

Mercury 2 is a proprietary, diffusion-based large language model (dLLM) from Inception designed for extremely fast reasoning and text generation with a long 128K-token context window.

Start Using API

What is Mercury 2?

Mercury 2 is a commercial-scale diffusion-based language model by Inception optimized for high-speed reasoning and generation. It is primarily used for code generation, analytical reasoning, and complex automation workflows where low latency is critical. It is also applied in AI agents, search, and business applications that benefit from rapid, large-context processing. Mercury 2 belongs to Inception’s Mercury family of diffusion-based LLMs, succeeding earlier Mercury models and specialized variants such as Mercury Coder.

5 Core Capabilities

  • Conversational AI

    Engages in multi-turn dialogue, answering questions and following instructions while maintaining context across user interactions.

  • Visual Analysis

    Processes images to identify objects and scenes, enabling descriptions and basic reasoning about visual content.

  • Text Translation

    Translates written content between multiple languages while attempting to preserve meaning and tone.

  • Document OCR

    Extracts machine-readable text from images or scanned documents, supporting downstream search or analysis.

  • Content Monitoring

    Assists in monitoring streams of textual data for specific topics or issues using pattern matching and basic analysis.

6 Most Valuable Use Cases

  • Contract Clause Extraction
  • Regulatory Change Monitoring
  • Financial Invoice Processing
  • Customer Support Tagging
  • IT Operations Automation
  • Procurement Risk Analysis

Cost Comparison

LLM API offers the lowest cost and highest performance for Mercury 2–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120 tps 99.99% $0.40 $0.80 128K tokens
Inception Global ~150ms ~60 tps ~99.9% ~$0.80 ~$1.60 ~64K tokens
OpenAI Global ~160ms ~70 tps ~99.9% ~$1.00 ~$2.00 ~128K tokens
Anthropic US East ~170ms ~50 tps ~99.9% ~$1.20 ~$2.40 ~200K tokens

Technical Specifications

Metric Mercury 2 (Inception) GPT-4.1 (OpenAI) Claude 3.5 Sonnet (Anthropic)
Avg Latency ~180ms ~220ms ~250ms
Context Window 128K 128K 200K
Input Price ($/1M) $0.80 $5.00 $3.00
Output Price ($/1M) $2.40 $15.00 $15.00
Max Output Tokens 8K 4K 8K
Throughput 80 tps 40 tps 50 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

7.5B
Prompt tokens processed (last 30 days)
5.1B
Completion tokens generated (last 30 days)
22.4M
API requests served (last 30 days)
98.9%
Avg uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Automatically route each request to the best model across providers based on latency, cost, and quality—without changing your integration or redeploying code.

    One endpoint, every model
  • Cost-Aware Orchestration

    Control and predict spend with per-route pricing policies, budget guards, and automatic downshifts to cheaper models when quality thresholds are still met.

    Optimize every token
  • Resilient Fallback Logic

    Survive provider outages and rate limits with built-in multi-region, multi-model failover so your app keeps responding even when an upstream service doesn’t.

    Always-on reliability
  • Full-Stack Observability

    Trace every request across models and providers with logs, latency breakdowns, and error analytics to debug faster and continuously tune your routing rules.

    See every token hop
  • Task-Level Abstractions

    Use high-level task APIs for chat, RAG, tools, and more so you can swap underlying models without rewriting prompts or business logic.

    Tasks, not raw calls
  • High-Throughput Batch

    Process massive workloads efficiently with parallelized, rate-limit-aware batch execution, automatic retries, and deduplicated inputs for lower cost and higher throughput.

    Ship at batch scale

When to Use — When NOT to Use

Use it if...

  • You need a general-purpose model from Inception already integrated into your infrastructure.
  • You need consistent behavior across many small automation tasks with moderate reasoning complexity.
  • Your use case involves standard customer-support chatbots that follow clear, pre-defined workflows.
  • Your use case involves drafting routine business content like emails, summaries, and reports.
  • Your use case involves running batch inference jobs where predictable costs matter more than peak capability.
  • You need a model suited for prototyping generic AI features before optimizing with specialized systems.

Avoid if...

  • You need state-of-the-art reasoning on complex scientific, mathematical, or legal problems.
  • You need guaranteed compliance with strict, audited industry regulations such as HIPAA or PCI-DSS.
  • Your workload requires ultra-low-latency real-time interactions for high-frequency trading or control systems.
  • Your workload requires on-device or fully offline inference without any external API dependency.
  • You need a highly specialized vision, speech, or code model rather than a generalist.
  • Your workload requires verifiable tool-calling support aligned exactly with another provider’s proprietary schema.

Frequently Asked Questions

  • What is Mercury 2?

    Mercury 2 is an Inception large language model accessible via LLM.API, designed for fast, cost-efficient general-purpose text generation and reasoning.

  • What types of tasks is Mercury 2 best suited for?

    Mercury 2 is best for code generation, step-by-step reasoning, chatbot-style conversations, and structured text transformations like summarization or extraction.

  • What is the context window of Mercury 2?

    Mercury 2 supports a 32K token context window, allowing it to handle long documents, multi-step tools, and extended conversations reliably.

  • How fast is Mercury 2 in terms of latency and throughput?

    Mercury 2 is optimized for low p95 latency and high token throughput, making it suitable for interactive applications and high-traffic backends.

  • Which input and output modalities does Mercury 2 support?

    Mercury 2 currently supports text input and text output only, with no native image, audio, or video processing.

  • How is Mercury 2 priced when accessed through LLM.API?

    Mercury 2 uses LLM.API’s unified token-based pricing, with separate rates for input and output tokens configurable per project in your LLM.API dashboard.

  • How do I call Mercury 2 through the LLM.API?

    Use the chat or completions endpoint with `model` set to `inception/mercury-2`, passing your prompt, optional system instructions, and any tool definitions.

  • How does Mercury 2 compare to similar mid-sized general-purpose models?

    Mercury 2 targets a balance of quality and speed, typically trading slightly lower peak capability for materially lower cost and latency.

  • Does Mercury 2 support tools, function calling, or structured outputs?

    Mercury 2 supports JSON-structured outputs and standard tool or function-calling semantics via LLM.API’s unified tool-calling interface.

  • What are the main limitations of Mercury 2?

    Mercury 2 can hallucinate facts, lacks real-time knowledge or browsing, and is not suitable for safety-critical or compliance-required decision-making without human review.

Start in 2 lines of code

Get My API Key