Powered by Mistral
Mistral Small 4
- Instruction Following
Mistral Small 4 is an open-source multimodal Mixture-of-Experts model from Mistral that unifies text, image, reasoning, and coding capabilities in a single efficient system. It targets high throughput and low cost while retaining strong performance across general chat, analysis, and developer workflows.
About the model
What is Mistral Small 4?
Mistral Small 4 is a unified large language model from Mistral that handles text and images with configurable reasoning in an efficient Mixture-of-Experts architecture. It is mainly used for fast conversational agents and general-purpose assistants that can switch between lightweight chat and deeper analytical reasoning as needed. It is also optimized for software development workflows, multimodal understanding (such as document and image analysis), and agentic tools that combine coding, planning, and perception in one model. It belongs to the Mistral Small family as a successor that consolidates earlier specialized models like Mistral Small, Magistral (reasoning), Pixtral (vision), and Devstral (coding) into a single open model.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Handles multi-turn conversations, answers questions, and follows instructions while maintaining context and coherent responses across dialogue turns.
-
Text Translation
Translates text between multiple languages, preserving meaning and tone for general-purpose, everyday translation tasks.
-
Code Understanding
Understands and reasons about source code, enabling tasks like explanation, refactoring suggestions, and simple code generation.
-
Image Interpretation
Accepts image inputs to identify objects and describe visual content, supporting multimodal question answering and explanation.
-
Text Extraction
Extracts textual information from images or documents, enabling reading of printed content and structured capture of key fields.
Use cases
6 Most Valuable Use Cases
- Customer Support Chatbots
- Summarizing Long Documents
- Legal Text Drafting
- Compliance Monitoring Assistance
- Product Description Generation
- Code Generation Assistance
Transparent pricing
Cost Comparison
LLM API offers the lowest prices and highest performance for Mistral Small–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~80 tps | 99.99% | $0.05 | $0.15 | 128K |
| Mistral | EU West | ~220ms | ~40 tps | 99.9% | ~$0.20 | ~$0.60 | ~32K |
| Azure | US East | ~260ms | ~35 tps | 99.9% | ~$0.25 | ~$0.75 | ~32K |
| AWS Bedrock | US West | ~280ms | ~30 tps | 99.9% | ~$0.28 | ~$0.80 | ~32K |
| Replicate | Global | ~320ms | ~20 tps | 99.5% | ~$0.35 | ~$1.00 | ~16K |
Performance benchmarks
Technical Specifications
| Metric | Mistral Small 4 | gpt-4.1-mini (OpenAI) | Claude 3.5 Haiku (Anthropic) |
|---|---|---|---|
| Avg Latency | ~200ms | ~180ms | ~220ms |
| Context Window | 32K | 128K | 200K |
| Input Price ($/1M) | $0.20 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.60 | $0.60 | $0.80 |
| Max Output Tokens | 4K | 4K | 4K |
| Throughput | ~60 tps | ~80 tps | ~50 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 24.5B
- Prompt tokens processed (last 30 days)
- 17.8B
- Completion tokens generated (last 30 days)
- 9.3M
- API requests served (last 30 days)
- 99.7%
- Avg uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the best model across providers using latency, cost, and quality signals—without changing your code or integrations.
One endpoint, every model -
Cost-Aware Orchestration
Optimize spend by mixing premium and budget models per call, with centralized limits, per-tenant controls, and real-time cost visibility baked into the gateway.
Max quality, lower cost -
Automatic Provider Fallbacks
Stay resilient when providers rate-limit or go down—LLM.API transparently retries and fails over to alternate models so your app keeps responding.
No more hard outages -
Deep LLM Observability
Trace every request across models with structured logs, metrics, and latency breakdowns to debug prompts, tune routing, and prove reliability to stakeholders.
See every token hop -
Task-Level Abstractions
Call higher-level tasks like chat, tools, RAG, and agents instead of raw models, so you can swap providers without rewriting application logic.
Code to tasks, not models -
High-Throughput Batch APIs
Ship bulk inference jobs through a single endpoint with concurrency control, deduping, and retries to reduce unit cost and saturate provider capacity safely.
Batch at full throttle
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a small, cost-efficient model for everyday chat, Q&A, and utilities.
- You need competent code generation and editing without paying for a flagship model.
- Your use case involves lightweight agents, tools, or backends needing reasonable reasoning at scale.
- Your use case involves batch-processing many short requests where throughput and price dominate.
- You need a general-purpose model from Mistral that integrates cleanly with their ecosystem.
- Your use case involves multilingual understanding and generation without requiring top-tier translation quality.
Avoid if...
- You need state-of-the-art reasoning performance comparable to the very best frontier models.
- Your workload requires highly reliable, domain-expert answers in medical, legal, or safety-critical contexts.
- You need very long-context understanding, such as entire books or massive codebases.
- Your workload requires the strongest available code generation and complex multi-file refactoring support.
- You need cutting-edge performance on math, logic puzzles, or multi-step planning tasks.
- Your workload requires highly specialized fine-tuning or custom safety guarantees beyond standard offerings.
FAQ
Frequently Asked Questions
-
What is Mistral Small 4?
Mistral Small 4 is a compact instruction-tuned language model by Mistral, optimized for low-latency, low-cost text generation and reasoning tasks.
-
What is Mistral Small 4 best suited for?
Mistral Small 4 is best for chatbots, lightweight agents, tools integration, and high-volume applications where cost and latency are critical.
-
What is the context window of Mistral Small 4?
Mistral Small 4 supports context windows up to 32K tokens via LLM.API.
-
Does Mistral Small 4 support images or other modalities?
No, Mistral Small 4 is a text-only model and does not natively support images, audio, or video inputs.
-
How is Mistral Small 4 priced on LLM.API?
Mistral Small 4 is billed on a per-token basis for input and output; check your LLM.API pricing page for the latest specific rates.
-
How fast is Mistral Small 4 through LLM.API?
Mistral Small 4 is optimized for low latency and high throughput, making it suitable for real-time user-facing applications.
-
How do I call Mistral Small 4 using LLM.API?
Specify the provider as "Mistral" and the model name as "mistral-small-4" in your LLM.API completion or chat invocation request.
-
How does Mistral Small 4 compare to larger Mistral or frontier models?
Mistral Small 4 is cheaper and faster but generally less capable on complex reasoning, long-context analysis, and highly specialized domains.
-
What are the main limitations of Mistral Small 4?
Mistral Small 4 can hallucinate, lacks up-to-the-minute real-world knowledge, and may underperform on very long, multi-step reasoning or niche expert tasks.
-
Can I use tools or function calling with Mistral Small 4 on LLM.API?
Yes, you can use LLM.API’s standard tool or function-calling interface, with Mistral Small 4 generating structured arguments for your tools.
