Voxtral Small 24B 2507

Instruction Following

Voxtral Small 24B 2507 is a 24-billion-parameter audio-language model from Mistral that extends Mistral Small 3 with advanced speech understanding. It is notable for strong, cost-efficient performance on transcription, translation, and audio-informed text tasks across multiple languages.

Start Using API

API Performance

Latency: ~0.7s time to first token
Context: ~32K token context
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is Voxtral Small 24B 2507?

Voxtral Small 24B 2507 is an open-source speech understanding and language model from Mistral that combines text generation with state-of-the-art audio input capabilities. It is mainly used for high-quality speech transcription and translation directly from audio in many languages. It is also applied to tasks like Q&A, summarization, and general chat where audio context must be understood alongside text. It belongs to the Voxtral family and is built as an enhancement of the Mistral Small 3 series.

Input / Output

Input

Text prompts (natural language, code, instructions)
Audio inputs for transcription, translation, and understanding

Output

Structured or free-form text responses (chat, explanations, summaries)

Model capabilities

5 Core Capabilities

Conversational Chat

Handles multi-turn text conversations with strong general reasoning, instruction following, and tool-use support in many domains and scenarios.
Audio Transcription

Transcribes spoken audio into accurate text using a dedicated speech transcription mode optimized for high-quality automatic speech recognition.
Speech Translation

Performs speech-to-text translation across multiple languages, enabling multilingual audio translation and cross-lingual understanding within one model.
Audio Understanding

Analyzes audio beyond transcription, supporting audio-based question answering, summarization, and semantic comprehension of spoken content.
Monitoring Integration

Integrates with various inference and observability platforms, supporting structured outputs, tools, and deployment in managed environments.

Use cases

6 Most Valuable Use Cases

Meeting Transcription
Multilingual Speech Translation
Audio-Based Q&A
Voice-Driven Function Calling
Call Center Analytics
Long Audio Summarization

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and best performance for Voxtral Small–class 24B models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	90ms	80 tps	99.99%	$0.40	$0.40	256K
Mistral	EU West	~140ms	~45 tps	99.9%	~$0.60	~$0.60	~128K
Together AI	US East	~160ms	~40 tps	99.9%	~$0.55	~$0.55	~128K
Fireworks AI	US West	~150ms	~42 tps	99.9%	~$0.58	~$0.58	~200K

Performance benchmarks

Technical Specifications

Metric	Voxtral Small 24B 2507 (Mistral)	Mistral Large 2 123B	GPT-4.1 Mini
Avg Latency	~220ms	~280ms	~180ms
Context Window	128K	128K	128K
Input Price ($/1M)	$0.40	$2.00	$0.15
Output Price ($/1M)	$1.20	$6.00	$0.60
Max Output Tokens	4K	4K	4K
Throughput	~60 tps	~40 tps	~80 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

2.8B: Prompt tokens processed (last 30 days)
2.1B: Completion tokens generated (last 30 days)
3.6M: API requests served (last 30 days)
99.8%: Average uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Dynamically route each request to the best model based on latency, cost, and capabilities—using one stable API contract instead of per-provider plumbing.
One endpoint, every model.
Cost-Aware Orchestration

Automatically balance price and quality with configurable cost ceilings, tiered model selection, and real-time usage controls so you never get surprised by your AI bill.
Control spend by design.
Resilient Fallbacks

Define automatic failover chains across providers and models. Survive outages and rate limits without rewriting application logic or degrading user experience.
Never fail on 500s
End-to-End Observability

Trace every request with structured logs, metrics, and latency breakdowns by provider and model. Debug production issues and tune routes using real data.
See every token
Task-Level Abstractions

Describe high-level tasks like chat, extraction, or generation once. LLM.API maps them to best-fit models, so you decouple product logic from vendors.
Think tasks, not models
High-Throughput Batch APIs

Process thousands of prompts in parallel with backpressure, retries, and cost controls built in. Ideal for reindexing, evaluations, and large content migrations.
Scale jobs, not code

Decision guide

When to Use — When NOT to Use

Use it if...

You need a capable general-purpose LLM from Mistral without paying flagship-model prices.
You need solid coding assistance, code explanation, and moderate debugging across common languages.
Your use case involves chatbots, agents, or copilots that handle everyday business questions.
You need good English writing help for emails, documentation, specs, or internal reports.
Your use case involves moderate-length context tasks, like analyzing single documents or conversations.
You need an open-weight or API-accessible model compatible with common Mistral tooling.

Avoid if...

You need top-tier reasoning quality approaching frontier models for complex, high-stakes decisions.
Your workload requires extremely long-context processing, like hundreds of pages per request.
You need heavily optimized multimodal capabilities, such as advanced vision or audio understanding.
Your workload requires state-of-the-art code generation for very large or safety-critical codebases.
You need the absolute best Mistral model performance regardless of higher cost or resource use.
Your workload requires ultra-low latency inference on very small edge devices with minimal memory.

FAQ

Frequently Asked Questions

What is Voxtral Small 24B 2507?

Voxtral Small 24B 2507 is a 24B-parameter Mistral model exposed via LLM.API, targeting high-quality, general-purpose text generation for developers.
What modalities does Voxtral Small 24B 2507 support?

Voxtral Small 24B 2507 is a text-only language model, supporting text input and text output through the LLM.API endpoints.
How is Voxtral Small 24B 2507 priced on LLM.API?

Voxtral Small 24B 2507 uses LLM.API’s unified per-token pricing; check your LLM.API dashboard or pricing docs for current input and output rates.
What is the context window of Voxtral Small 24B 2507?

Voxtral Small 24B 2507 supports a multi‑kilotoken context window suitable for long prompts and conversations; see the LLM.API model card for exact limits.
How fast is Voxtral Small 24B 2507 on LLM.API?

Voxtral Small 24B 2507 is optimized for low-latency inference with streaming responses, but actual speed depends on prompt length and concurrency.
What is Voxtral Small 24B 2507 best suited for?

Voxtral Small 24B 2507 is best for general chat, code assistance, reasoning over medium-length documents, and building production assistants with predictable cost.
How do I call Voxtral Small 24B 2507 via LLM.API?

Use the standard LLM.API chat or completions endpoint and set the model field to "Voxtral Small 24B 2507" in your request payload.
How does Voxtral Small 24B 2507 compare to similar-sized models?

Voxtral Small 24B 2507 targets a balance of quality and throughput comparable to other ~20–30B open models, exposed under a unified LLM.API interface.
What are the main limitations of Voxtral Small 24B 2507?

Voxtral Small 24B 2507 can hallucinate, lacks real-time knowledge or browsing, and should not be used as a sole source for critical decisions.
Can Voxtral Small 24B 2507 handle tools or function calling via LLM.API?

If enabled in LLM.API, Voxtral Small 24B 2507 can follow tool or function-calling schemas, but behavior depends on your request format and routing configuration.

Start in 2 lines of code

Get My API Key

Voxtral Small 24B 2507

What is Voxtral Small 24B 2507?

5 Core Capabilities

Conversational Chat

Audio Transcription

Speech Translation

Audio Understanding

Monitoring Integration

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Orchestration

Resilient Fallbacks

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code