Mistral Medium 3.5

Text Generation

Mistral Medium 3.5 is a 128B-parameter dense large language model from Mistral, designed as a flagship "merged" model for strong general-purpose reasoning, coding, and long-context tasks. It targets a balance of capability, latency, and cost for production AI applications.

Start Using API

API Performance

Latency: ~0.7s time to first token
Context: 128K tokens
Input: $2.00 per 1M tokens
Output: $6.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Mistral Medium 3.5?

Mistral Medium 3.5 is a dense 128B-parameter large language model by Mistral optimized for general-purpose text understanding and generation with a 256K-token context window. It is used for software development assistance, long-running autonomous or remote coding agents, and other knowledge work requiring reliable reasoning over large contexts. It also serves as a default or backbone model in several Mistral products and third-party platforms for assistants, agents, and enterprise applications. It follows earlier Mistral Medium 3-series models and complements other Mistral families such as Mistral Large and the smaller Ministral models.

Model capabilities

5 Core Capabilities

Multimodal Reasoning

Processes both text and images, performing instruction-following, logical reasoning, and complex problem solving within a unified 128B dense model.
Advanced Chat

Provides strong instruction-following, conversational responses, and system-prompt control suitable for assistants, support bots, and long-context interactions.
Code Generation

Generates, debugs, and refactors code, enabling sophisticated coding agents and long-running software engineering workflows with high benchmark performance.
Multilingual Support

Understands and generates text in dozens of languages, including major European and Asian languages, for global applications and content.
OCR and Vision

Performs OCR and document understanding with a custom vision encoder handling variable image sizes, layouts, and structured visual annotations.

Use cases

6 Most Valuable Use Cases

General AI Assistant
Software Code Generation
Document Question Answering
Legal and Policy Drafting
Business Process Automation
Customer Support Monitoring

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance for Mistral Medium 3.5–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	90ms	120 tps	99.99%	$0.20	$0.60	128K
Mistral (direct)	EU West	~180ms	~60 tps	99.9%	~$0.25	~$0.75	128K
Azure (Mistral-compatible)	US East	~220ms	~50 tps	99.9%	~$0.35	~$1.00	128K
AWS Bedrock (Mistral-like)	US West	~210ms	~55 tps	99.9%	~$0.30	~$0.90	128K
Replicate (Mistral-compatible)	Global	~260ms	~30 tps	99.5%	~$0.40	~$1.20	~64K

Performance benchmarks

Technical Specifications

Metric	Mistral Medium 3.5	GPT-4.1 Mini	Claude 3.5 Haiku
Avg Latency	~180ms	~220ms	~250ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.60	$0.15	$0.25
Output Price ($/1M)	$1.80	$0.60	$1.25
Max Output Tokens	4K	4K	4K
Throughput	~70 tps	~80 tps	~65 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

62B: Prompt tokens processed (last 30 days)
21M: Completion tokens generated (last 30 days)
3.4M: API requests served (last 30 days)
99.8%: Avg API uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route requests across providers and model families via one endpoint, using rules or performance data to balance quality, latency, and reliability automatically.
One endpoint, every model
Cost-Aware Orchestration

Enforce per-project and per-route budgets, downshift to cheaper models automatically, and compare providers so you never overspend for the same output quality.
Control spend by design
Resilient Fallbacks

Define provider- and model-level fallbacks so requests transparently fail over on timeouts, rate limits, or outages—without changing your application code.
No single point of failure
Deep Observability

Get unified logs, traces, and metrics for every request across providers—latency, errors, tokens, and cost—so you can debug and optimize production workloads quickly.
See every token spent
Task-Level Abstractions

Call high-level tasks—chat, tools, RAG, image, embeddings—through a stable API while LLM.API handles prompt shaping, model quirks, and provider differences underneath.
Code to tasks, not models
High-Throughput Batch

Submit large batches of prompts to any provider with automatic chunking, retries, and aggregation, maximizing throughput while staying within rate and budget limits.
Ship at batch scale

Decision guide

When to Use — When NOT to Use

Use it if...

You need a capable general-purpose LLM with solid reasoning at moderate cost.
You need strong code generation, refactoring, and debugging across common programming languages.
Your use case involves building chatbots or agents that require consistent, fluent English.
You need good performance on typical enterprise tasks like summaries, extraction, and classification.
Your use case involves moderate context lengths rather than extremely long multi-document workflows.
You need an open-weights-friendly ecosystem and Mistral-compatible tooling or deployment stacks.
Your use case involves augmenting applications with reliable function calling and tool-use behavior.

Avoid if...

You need frontier-level reasoning comparable to the very latest top-tier proprietary flagship models.
Your workload requires extremely long context handling, like hundreds of pages per request.
You need the absolute best performance on complex mathematics or formal theorem proving.
Your workload requires specialized multimodal capabilities such as advanced vision or audio understanding.
You need a model with the broadest possible ecosystem support and vendor-native integrations.
Your workload requires strict enterprise certifications or compliance only offered by major hyperscalers.
You need ultra-low-latency responses for real-time interactive systems with tight SLA guarantees.

FAQ

Frequently Asked Questions

What is Mistral Medium 3.5?

Mistral Medium 3.5 is a proprietary large language model by Mistral aimed at general-purpose coding, reasoning, and chat workloads with balanced cost and quality.
What is the context window of Mistral Medium 3.5?

Mistral Medium 3.5 supports up to a 32K token context window for combined input and output via LLM.API.
How is Mistral Medium 3.5 priced on LLM.API?

Mistral Medium 3.5 usage on LLM.API is billed per input and output token; check your LLM.API pricing page for current rates.
How fast is Mistral Medium 3.5 on LLM.API?

Mistral Medium 3.5 is optimized for low-latency streaming responses, with actual speed depending on prompt size and your network conditions.
What modalities does Mistral Medium 3.5 support via LLM.API?

Through LLM.API, Mistral Medium 3.5 currently supports text input and text output only.
How do I call Mistral Medium 3.5 through LLM.API?

Select the Mistral provider and the Mistral Medium 3.5 model ID in your LLM.API client or HTTP requests to route calls to this model.
What is Mistral Medium 3.5 best suited for?

Mistral Medium 3.5 is best for production chatbots, code generation, data transformation, and general reasoning tasks needing a balance of capability and price.
How does Mistral Medium 3.5 compare to smaller Mistral models?

Compared with lighter Mistral models, Mistral Medium 3.5 generally offers stronger reasoning, coding, and instruction-following at higher cost and latency.
Does Mistral Medium 3.5 have any notable limitations?

Mistral Medium 3.5 can hallucinate incorrect facts, lacks real-time internet access, and should not be used for unsupervised high-stakes decisions.
Can I fine-tune Mistral Medium 3.5 through LLM.API?

Direct fine-tuning of Mistral Medium 3.5 is not available via LLM.API; use prompting or retrieval-augmented techniques instead.

Start in 2 lines of code

Get My API Key

Mistral Medium 3.5

What is Mistral Medium 3.5?

5 Core Capabilities

Multimodal Reasoning

Advanced Chat

Code Generation

Multilingual Support

OCR and Vision

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Resilient Fallbacks

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code