Powered by Mistral

Mistral Small 4

  • Instruction Following

Mistral Small 4 is an open-source multimodal Mixture-of-Experts model from Mistral that unifies text, image, reasoning, and coding capabilities in a single efficient system. It targets high throughput and low cost while retaining strong performance across general chat, analysis, and developer workflows.

Start Using API

What is Mistral Small 4?

Mistral Small 4 is a unified large language model from Mistral that handles text and images with configurable reasoning in an efficient Mixture-of-Experts architecture. It is mainly used for fast conversational agents and general-purpose assistants that can switch between lightweight chat and deeper analytical reasoning as needed. It is also optimized for software development workflows, multimodal understanding (such as document and image analysis), and agentic tools that combine coding, planning, and perception in one model. It belongs to the Mistral Small family as a successor that consolidates earlier specialized models like Mistral Small, Magistral (reasoning), Pixtral (vision), and Devstral (coding) into a single open model.

5 Core Capabilities

  • Conversational Chat

    Handles multi-turn conversations, answers questions, and follows instructions while maintaining context and coherent responses across dialogue turns.

  • Text Translation

    Translates text between multiple languages, preserving meaning and tone for general-purpose, everyday translation tasks.

  • Code Understanding

    Understands and reasons about source code, enabling tasks like explanation, refactoring suggestions, and simple code generation.

  • Image Interpretation

    Accepts image inputs to identify objects and describe visual content, supporting multimodal question answering and explanation.

  • Text Extraction

    Extracts textual information from images or documents, enabling reading of printed content and structured capture of key fields.

6 Most Valuable Use Cases

  • Customer Support Chatbots
  • Summarizing Long Documents
  • Legal Text Drafting
  • Compliance Monitoring Assistance
  • Product Description Generation
  • Code Generation Assistance

Cost Comparison

LLM API offers the lowest prices and highest performance for Mistral Small–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~120ms ~80 tps 99.99% $0.05 $0.15 128K
Mistral EU West ~220ms ~40 tps 99.9% ~$0.20 ~$0.60 ~32K
Azure US East ~260ms ~35 tps 99.9% ~$0.25 ~$0.75 ~32K
AWS Bedrock US West ~280ms ~30 tps 99.9% ~$0.28 ~$0.80 ~32K
Replicate Global ~320ms ~20 tps 99.5% ~$0.35 ~$1.00 ~16K

Technical Specifications

Metric Mistral Small 4 gpt-4.1-mini (OpenAI) Claude 3.5 Haiku (Anthropic)
Avg Latency ~200ms ~180ms ~220ms
Context Window 32K 128K 200K
Input Price ($/1M) $0.20 $0.15 $0.25
Output Price ($/1M) $0.60 $0.60 $0.80
Max Output Tokens 4K 4K 4K
Throughput ~60 tps ~80 tps ~50 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

24.5B
Prompt tokens processed (last 30 days)
17.8B
Completion tokens generated (last 30 days)
9.3M
API requests served (last 30 days)
99.7%
Avg uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Dynamically route each request to the best model across providers using latency, cost, and quality signals—without changing your code or integrations.

    One endpoint, every model
  • Cost-Aware Orchestration

    Optimize spend by mixing premium and budget models per call, with centralized limits, per-tenant controls, and real-time cost visibility baked into the gateway.

    Max quality, lower cost
  • Automatic Provider Fallbacks

    Stay resilient when providers rate-limit or go down—LLM.API transparently retries and fails over to alternate models so your app keeps responding.

    No more hard outages
  • Deep LLM Observability

    Trace every request across models with structured logs, metrics, and latency breakdowns to debug prompts, tune routing, and prove reliability to stakeholders.

    See every token hop
  • Task-Level Abstractions

    Call higher-level tasks like chat, tools, RAG, and agents instead of raw models, so you can swap providers without rewriting application logic.

    Code to tasks, not models
  • High-Throughput Batch APIs

    Ship bulk inference jobs through a single endpoint with concurrency control, deduping, and retries to reduce unit cost and saturate provider capacity safely.

    Batch at full throttle

When to Use — When NOT to Use

Use it if...

  • You need a small, cost-efficient model for everyday chat, Q&A, and utilities.
  • You need competent code generation and editing without paying for a flagship model.
  • Your use case involves lightweight agents, tools, or backends needing reasonable reasoning at scale.
  • Your use case involves batch-processing many short requests where throughput and price dominate.
  • You need a general-purpose model from Mistral that integrates cleanly with their ecosystem.
  • Your use case involves multilingual understanding and generation without requiring top-tier translation quality.

Avoid if...

  • You need state-of-the-art reasoning performance comparable to the very best frontier models.
  • Your workload requires highly reliable, domain-expert answers in medical, legal, or safety-critical contexts.
  • You need very long-context understanding, such as entire books or massive codebases.
  • Your workload requires the strongest available code generation and complex multi-file refactoring support.
  • You need cutting-edge performance on math, logic puzzles, or multi-step planning tasks.
  • Your workload requires highly specialized fine-tuning or custom safety guarantees beyond standard offerings.

Frequently Asked Questions

  • What is Mistral Small 4?

    Mistral Small 4 is a compact instruction-tuned language model by Mistral, optimized for low-latency, low-cost text generation and reasoning tasks.

  • What is Mistral Small 4 best suited for?

    Mistral Small 4 is best for chatbots, lightweight agents, tools integration, and high-volume applications where cost and latency are critical.

  • What is the context window of Mistral Small 4?

    Mistral Small 4 supports context windows up to 32K tokens via LLM.API.

  • Does Mistral Small 4 support images or other modalities?

    No, Mistral Small 4 is a text-only model and does not natively support images, audio, or video inputs.

  • How is Mistral Small 4 priced on LLM.API?

    Mistral Small 4 is billed on a per-token basis for input and output; check your LLM.API pricing page for the latest specific rates.

  • How fast is Mistral Small 4 through LLM.API?

    Mistral Small 4 is optimized for low latency and high throughput, making it suitable for real-time user-facing applications.

  • How do I call Mistral Small 4 using LLM.API?

    Specify the provider as "Mistral" and the model name as "mistral-small-4" in your LLM.API completion or chat invocation request.

  • How does Mistral Small 4 compare to larger Mistral or frontier models?

    Mistral Small 4 is cheaper and faster but generally less capable on complex reasoning, long-context analysis, and highly specialized domains.

  • What are the main limitations of Mistral Small 4?

    Mistral Small 4 can hallucinate, lacks up-to-the-minute real-world knowledge, and may underperform on very long, multi-step reasoning or niche expert tasks.

  • Can I use tools or function calling with Mistral Small 4 on LLM.API?

    Yes, you can use LLM.API’s standard tool or function-calling interface, with Mistral Small 4 generating structured arguments for your tools.

Start in 2 lines of code

Get My API Key