Voxtral Mini TTS

Text-to-Speech

Voxtral Mini TTS is Mistral’s 4B-parameter text-to-speech model that generates expressive, low-latency speech and supports multilingual, zero-shot voice cloning. It is available via the Mistral API and as open weights for self-hosting.

Start Using API

API Performance

Latency: ~1.0s avg generation time
Context: ~10 min max duration
Input: Free per minute of input text
Output: Free per minute of generated audio
Uptime: 99% 99%

About the model

What is Voxtral Mini TTS?

Voxtral Mini TTS is a 4B-parameter text-to-speech model from Mistral that converts text into natural, expressive speech with multilingual support and voice cloning from very short audio samples. It is mainly used to build voice agents and assistants that respond in real time with low-latency audio, and to generate high-quality synthetic voices for applications like content narration, product voices, and accessibility tools. It also serves use cases that require cloning or reusing consistent speaker identities across many utterances, such as branded voice experiences and character dialogue. The model is part of Mistral’s Voxtral audio family, alongside Voxtral Mini and Voxtral Small transcription and audio-understanding models.

Input / Output

Input

Text prompts (characters to be synthesized as speech)

Output

Audio speech output (MP3, WAV, PCM, FLAC, Opus via TTS API)

Model capabilities

5 Core Capabilities

Text-to-Speech

Generates natural-sounding speech audio from written text, suitable for dialogue, narration, and interface responses in multiple scenarios.
Conversational Output

Produces speech tailored for interactive assistants, enabling clear, responsive spoken dialogue aligned with conversational AI systems’ outputs.
Multilingual Speech

Supports speech generation in multiple languages, allowing applications to vocalize content for diverse linguistic audiences and use cases.
Screen Reader Compatibility

Can power screen readers or accessibility tools by converting on-screen text into intelligible, continuous spoken audio output.
Media Content Voice

Provides synthesized voices for videos, podcasts, or interactive media, enabling scalable voiceover creation without human recording sessions.

Use cases

6 Most Valuable Use Cases

Voice App Prototyping
Customer Support Prompts
Accessibility Voice Output
Interactive Voice Demos
Spoken Content Previews
Educational Voice Feedback

Transparent pricing

Cost Comparison

Up to ~70% cheaper and lower-latency than comparable TTS APIs

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 req/s	99.99%	$0.004/min	$0.004/min	~15 min audio
Mistral	EU West	~140ms	~45 req/s	~99.9%	~$0.010/min	~$0.010/min	~10 min audio
OpenAI	Global	~150ms	~60 req/s	99.9%	~$0.015/min	~$0.015/min	~15 min audio
Azure AI Speech	Global	~180ms	~80 req/s	99.9%	~$0.016/min	~$0.016/min	~10 min audio
Google Cloud Text-to-Speech	Global	~170ms	~70 req/s	99.9%	~$0.014/min	~$0.014/min	~10 min audio

Performance benchmarks

Technical Specifications

Metric	Voxtral Mini TTS	OpenAI gpt-4o-mini TTS	Google Chirp TTS (small)
Avg Latency	~180ms	~200ms	~220ms
Languages Supported	~25	~30	~20
Price per 1M chars	~$0.70	~$1.00	~$0.80
Max Input Length	~4K chars	~8K chars	~5K chars
Sample Rate	24 kHz	24 kHz	22.05 kHz
Voices / Styles	~20	~30	~15
Uptime	99.9%	99.9%	99.5%

30-day usage via LLM API

620M: Characters synthesized last 30 days
3.4M: TTS API requests served
210K: Unique developer projects using Voxtral Mini TTS
99.96%: Average API uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the best-fit model across providers based on cost, latency, or quality—without changing your code or client integration.
One endpoint, every model.
Cost-Aware Orchestration

Define cost ceilings and model preferences, then let LLM.API optimize per-call spend so you can scale usage without surprise bills or manual tuning.
More usage, less spend.
Automatic Fallbacks

When a provider times out, errors, or rate-limits, LLM.API seamlessly retries on backup models so your production flows stay reliable and resilient.
No single point of failure.
Deep Observability

Get unified logs, metrics, traces, and payload samples across all models and providers, making debugging, performance tuning, and governance radically simpler.
See every token, everywhere.
Task-Level Abstractions

Describe tasks like chat, generation, tools, or RAG once and let LLM.API translate them into provider-specific calls, so you avoid brittle model-specific code.
Code to tasks, not models.
High-Throughput Batch

Send massive batches of prompts through a single API call, with automatic chunking, retries, and concurrency controls to maximize throughput across providers.
Process thousands in one go.

Decision guide

When to Use — When NOT to Use

Use it if...

You need lightweight text-to-speech for applications where a compact model is sufficient.
You need TTS integrated into an existing Mistral-based stack for simpler deployment.
Your use case involves prototyping speech features without requiring enterprise-grade voice quality.
Your use case involves cost-sensitive scenarios where smaller speech models are advantageous.
You need basic voice output for chatbots, assistants, or simple narration tasks.

Avoid if...

You need state-of-the-art naturalness and expressiveness on par with premium commercial TTS.
Your workload requires highly controllable prosody, emotions, and detailed voice style parameters.
You need robust multilingual coverage and accents beyond what Mistral explicitly supports.
Your workload requires ultra-high-fidelity audio for production media, film, or advertising.
You need mature, battle-tested TTS with extensive tooling, ecosystem, and vendor guarantees.

FAQ

Frequently Asked Questions

What is Voxtral Mini TTS?

Voxtral Mini TTS is a Mistral text-to-speech model focused on fast, lightweight voice synthesis for applications that need low-latency audio generation.
What is Voxtral Mini TTS best suited for?

It is best for real-time or near real-time speech generation in interactive apps, voice assistants, and low-resource environments.
How is Voxtral Mini TTS priced when used through LLM.API?

Pricing is usage-based per generated character or token, with exact rates defined in the LLM.API model pricing table.
What context window or input length limits does Voxtral Mini TTS have?

The model accepts short to moderate text prompts suitable for speech synthesis, with exact character limits determined by LLM.API configuration.
How fast is Voxtral Mini TTS in terms of latency?

Voxtral Mini TTS is optimized for low latency, typically returning audio quickly enough for responsive user experiences in interactive applications.
What modalities does Voxtral Mini TTS support?

It supports text-to-speech only, taking text input and returning synthesized audio output.
How do I access Voxtral Mini TTS through LLM.API?

Call the LLM.API generation endpoint with the Voxtral Mini TTS model identifier, passing text input and any audio configuration parameters supported by the API.
How does Voxtral Mini TTS compare to larger TTS models?

Compared to larger TTS models, it trades some maximum quality and configurability for lower cost, faster inference, and smaller resource requirements.
What limitations should I be aware of when using Voxtral Mini TTS?

Limitations can include less natural prosody on complex texts, language coverage constraints, and quality degradation on very long inputs.
Does Voxtral Mini TTS support streaming audio output via LLM.API?

Streaming availability depends on LLM.API’s implementation; check the streaming or response_mode options for this specific model.

Start in 2 lines of code

Get My API Key

Voxtral Mini TTS

What is Voxtral Mini TTS?

5 Core Capabilities

Text-to-Speech

Conversational Output

Multilingual Speech

Screen Reader Compatibility

Media Content Voice

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Automatic Fallbacks

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code