Mistral Small 4

Instruction Following

Mistral Small 4 is an open-source multimodal Mixture-of-Experts model from Mistral that unifies text, image, reasoning, and coding capabilities in a single efficient system. It targets high throughput and low cost while retaining strong performance across general chat, analysis, and developer workflows.

Start Using API

API Performance

Latency: ~0.8s time to first token
Context: 32K token context
Input: $2.00 per 1M tokens
Output: $6.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Mistral Small 4?

Mistral Small 4 is a unified large language model from Mistral that handles text and images with configurable reasoning in an efficient Mixture-of-Experts architecture. It is mainly used for fast conversational agents and general-purpose assistants that can switch between lightweight chat and deeper analytical reasoning as needed. It is also optimized for software development workflows, multimodal understanding (such as document and image analysis), and agentic tools that combine coding, planning, and perception in one model. It belongs to the Mistral Small family as a successor that consolidates earlier specialized models like Mistral Small, Magistral (reasoning), Pixtral (vision), and Devstral (coding) into a single open model.

Input / Output

Input

Text prompts
Images for vision and document understanding

Output

Structured or free-form text responses
Source code generation and editing

Model capabilities

5 Core Capabilities

Conversational Chat

Handles multi-turn conversations, answers questions, and follows instructions while maintaining context and coherent responses across dialogue turns.
Text Translation

Translates text between multiple languages, preserving meaning and tone for general-purpose, everyday translation tasks.
Code Understanding

Understands and reasons about source code, enabling tasks like explanation, refactoring suggestions, and simple code generation.
Image Interpretation

Accepts image inputs to identify objects and describe visual content, supporting multimodal question answering and explanation.
Text Extraction

Extracts textual information from images or documents, enabling reading of printed content and structured capture of key fields.

Use cases

6 Most Valuable Use Cases

Customer Support Chatbots
Summarizing Long Documents
Legal Text Drafting
Compliance Monitoring Assistance
Product Description Generation
Code Generation Assistance

Transparent pricing

Cost Comparison

LLM API offers the lowest prices and highest performance for Mistral Small–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~80 tps	99.99%	$0.05	$0.15	128K
Mistral	EU West	~220ms	~40 tps	99.9%	~$0.20	~$0.60	~32K
Azure	US East	~260ms	~35 tps	99.9%	~$0.25	~$0.75	~32K
AWS Bedrock	US West	~280ms	~30 tps	99.9%	~$0.28	~$0.80	~32K
Replicate	Global	~320ms	~20 tps	99.5%	~$0.35	~$1.00	~16K

Performance benchmarks

Technical Specifications

Metric	Mistral Small 4	gpt-4.1-mini (OpenAI)	Claude 3.5 Haiku (Anthropic)
Avg Latency	~200ms	~180ms	~220ms
Context Window	32K	128K	200K
Input Price ($/1M)	$0.20	$0.15	$0.25
Output Price ($/1M)	$0.60	$0.60	$0.80
Max Output Tokens	4K	4K	4K
Throughput	~60 tps	~80 tps	~50 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

24.5B: Prompt tokens processed (last 30 days)
17.8B: Completion tokens generated (last 30 days)
9.3M: API requests served (last 30 days)
99.7%: Avg uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the best model across providers using latency, cost, and quality signals—without changing your code or integrations.
One endpoint, every model
Cost-Aware Orchestration

Optimize spend by mixing premium and budget models per call, with centralized limits, per-tenant controls, and real-time cost visibility baked into the gateway.
Max quality, lower cost
Automatic Provider Fallbacks

Stay resilient when providers rate-limit or go down—LLM.API transparently retries and fails over to alternate models so your app keeps responding.
No more hard outages
Deep LLM Observability

Trace every request across models with structured logs, metrics, and latency breakdowns to debug prompts, tune routing, and prove reliability to stakeholders.
See every token hop
Task-Level Abstractions

Call higher-level tasks like chat, tools, RAG, and agents instead of raw models, so you can swap providers without rewriting application logic.
Code to tasks, not models
High-Throughput Batch APIs

Ship bulk inference jobs through a single endpoint with concurrency control, deduping, and retries to reduce unit cost and saturate provider capacity safely.
Batch at full throttle

Decision guide

When to Use — When NOT to Use

Use it if...

You need a small, cost-efficient model for everyday chat, Q&A, and utilities.
You need competent code generation and editing without paying for a flagship model.
Your use case involves lightweight agents, tools, or backends needing reasonable reasoning at scale.
Your use case involves batch-processing many short requests where throughput and price dominate.
You need a general-purpose model from Mistral that integrates cleanly with their ecosystem.
Your use case involves multilingual understanding and generation without requiring top-tier translation quality.

Avoid if...

You need state-of-the-art reasoning performance comparable to the very best frontier models.
Your workload requires highly reliable, domain-expert answers in medical, legal, or safety-critical contexts.
You need very long-context understanding, such as entire books or massive codebases.
Your workload requires the strongest available code generation and complex multi-file refactoring support.
You need cutting-edge performance on math, logic puzzles, or multi-step planning tasks.
Your workload requires highly specialized fine-tuning or custom safety guarantees beyond standard offerings.

FAQ

Frequently Asked Questions

What is Mistral Small 4?

Mistral Small 4 is a compact instruction-tuned language model by Mistral, optimized for low-latency, low-cost text generation and reasoning tasks.
What is Mistral Small 4 best suited for?

Mistral Small 4 is best for chatbots, lightweight agents, tools integration, and high-volume applications where cost and latency are critical.
What is the context window of Mistral Small 4?

Mistral Small 4 supports context windows up to 32K tokens via LLM.API.
Does Mistral Small 4 support images or other modalities?

No, Mistral Small 4 is a text-only model and does not natively support images, audio, or video inputs.
How is Mistral Small 4 priced on LLM.API?

Mistral Small 4 is billed on a per-token basis for input and output; check your LLM.API pricing page for the latest specific rates.
How fast is Mistral Small 4 through LLM.API?

Mistral Small 4 is optimized for low latency and high throughput, making it suitable for real-time user-facing applications.
How do I call Mistral Small 4 using LLM.API?

Specify the provider as "Mistral" and the model name as "mistral-small-4" in your LLM.API completion or chat invocation request.
How does Mistral Small 4 compare to larger Mistral or frontier models?

Mistral Small 4 is cheaper and faster but generally less capable on complex reasoning, long-context analysis, and highly specialized domains.
What are the main limitations of Mistral Small 4?

Mistral Small 4 can hallucinate, lacks up-to-the-minute real-world knowledge, and may underperform on very long, multi-step reasoning or niche expert tasks.
Can I use tools or function calling with Mistral Small 4 on LLM.API?

Yes, you can use LLM.API’s standard tool or function-calling interface, with Mistral Small 4 generating structured arguments for your tools.

Start in 2 lines of code

Get My API Key

Mistral Small 4

What is Mistral Small 4?

5 Core Capabilities

Conversational Chat

Text Translation

Code Understanding

Image Interpretation

Text Extraction

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Automatic Provider Fallbacks

Deep LLM Observability

Task-Level Abstractions

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code