Powered by Mistral

Voxtral Mini Transcribe

  • Speech-to-Text

Voxtral Mini Transcribe is a speech-to-text model from Mistral focused on lightweight, efficient audio transcription. It is designed to provide accurate transcriptions while being small and fast enough for resource-constrained environments.

Start Using API

What is Voxtral Mini Transcribe?

Voxtral Mini Transcribe is a compact automatic speech recognition (ASR) model by Mistral for converting spoken audio into text. It is mainly used for real-time or near real-time transcription of voice recordings, calls, and meetings. It is also suitable for integrating speech input into applications where low latency and low computational overhead are important. It belongs to Mistral’s Voxtral family of ASR models, which are optimized for practical deployment and efficiency.

5 Core Capabilities

  • Speech Transcription

    Converts spoken audio into accurate text, supporting various speakers and recording conditions for transcription and note-taking use cases.

  • Real-Time Transcribe

    Processes streaming audio input to produce near real-time text transcripts suitable for live captions and interactive applications.

  • Multilingual Transcription

    Transcribes speech from multiple supported languages, enabling cross-lingual audio processing and global applications requiring language-aware transcription.

  • Dialogue-Oriented Output

    Produces structured, readable transcripts suitable for conversational contexts, meetings, and interviews, preserving speaker turns when available.

  • Audio-to-Text Alignment

    Generates text closely aligned with input audio segments, facilitating downstream search, navigation, and timestamp-based audio indexing.

6 Most Valuable Use Cases

  • Meeting transcription
  • Customer call analysis
  • Legal deposition transcripts
  • Live webinar captioning
  • Voice note processing
  • Podcast batch transcription

Cost Comparison

LLM API offers the lowest per‑minute pricing and best SLAs for Voxtral-class transcription.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~180ms ~120 audio min/min 99.99% $0.004/min $0.000/min ~120 min audio
Mistral EU West ~220ms ~80 audio min/min ~99.9% ~$0.006/min $0.000/min ~60 min audio
OpenAI Global ~250ms ~90 audio min/min ~99.9% ~$0.006/min $0.000/min ~60 min audio
Azure AI Global ~260ms ~70 audio min/min ~99.9% ~$0.007/min $0.000/min ~60 min audio
Google Cloud Global ~240ms ~75 audio min/min ~99.9% ~$0.007/min $0.000/min ~60 min audio

Technical Specifications

Metric Voxtral Mini Transcribe OpenAI Whisper v3 Small Google Speech-to-Text v2
Avg Latency ~350ms ~400ms ~450ms
Languages Supported ~100 ~100 ~70
Price per Minute $0.006 $0.006 $0.009
Max Duration 2h 12h 3h
Accuracy (WER) ~7% ~6% ~8%
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

620M
Audio seconds transcribed (last 30 days)
9.4M
Transcription API requests (last 30 days)
4.7M
Unique speakers detected (last 30 days)
99.8%
Avg API uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Dynamically route each request to the best model across providers based on latency, cost, and capability—no client changes, just smarter traffic.

    One endpoint, many models
  • Cost-Aware Orchestration

    Automatically balance quality and spend with fine-grained controls, price-aware routing, and per-project limits so teams ship fast without surprise cloud bills.

    Control spend, not speed
  • Resilient Fallback Flows

    Define provider and model fallback chains that trigger on errors, timeouts, or degraded quality so your AI features stay online even when vendors don’t.

    Fail soft, stay online
  • Full-Stack Observability

    Trace every call across providers with unified logs, metrics, and structured traces so you can debug latency spikes and failures in minutes, not days.

    See every token hop
  • Task-Level Abstractions

    Describe work as tasks—chat, tools, RAG, agents—and let LLM.API pick the right models, prompts, and configs so you focus on product, not plumbing.

    Program tasks, not models
  • High-Throughput Batch Jobs

    Run large-scale generations, evaluations, and data labeling as batched jobs with built-in retries, concurrency controls, and cost tracking from a single API.

    Scale to millions of calls

When to Use — When NOT to Use

Use it if...

  • You need fast, lightweight speech-to-text transcription for short audio clips or calls.
  • Your use case involves batch transcribing many short recordings with tight cost constraints.
  • You need a compact transcription model suitable for on-device or edge deployments.
  • Your use case involves generating transcripts primarily for search, indexing, or logging.
  • You need a simple ASR component to feed downstream NLP or analytics pipelines.
  • Your use case involves prototyping speech features without requiring a large general LLM.

Avoid if...

  • You need complex language understanding, summarization, or reasoning beyond basic transcription output.
  • Your workload requires state-of-the-art accuracy on noisy, multilingual, or domain-specific audio.
  • You need robust diarization, speaker attribution, or advanced audio segmentation features.
  • Your workload requires rich text generation, chat, or code understanding capabilities.
  • You need guaranteed, enterprise-grade SLAs and compliance for sensitive regulated speech data.
  • Your workload requires real-time, ultra-low-latency streaming transcription at massive global scale.

Frequently Asked Questions

  • What is Voxtral Mini Transcribe?

    Voxtral Mini Transcribe is a speech-to-text model by Mistral optimized for fast, low-cost audio transcription via the LLM.API gateway.

  • What modalities does Voxtral Mini Transcribe support?

    Voxtral Mini Transcribe supports audio input and returns transcribed text output; it does not process images, video, or arbitrary text prompts directly.

  • How do I access Voxtral Mini Transcribe through LLM.API?

    You call the unified LLM.API endpoint with the model name 'mistral:voxtral-mini-transcribe' and provide your audio data and parameters in the request body.

  • What is Voxtral Mini Transcribe best suited for?

    Voxtral Mini Transcribe is best for real-time or batch transcription of spoken content such as meetings, calls, podcasts, and voice notes.

  • What is the typical latency of Voxtral Mini Transcribe on LLM.API?

    Typical end-to-end latency is a few seconds for short audio clips, depending on audio length, network conditions, and your region.

  • What context window or duration limits apply to Voxtral Mini Transcribe?

    Voxtral Mini Transcribe is limited by maximum audio duration per request, so long recordings should be chunked into smaller segments client-side.

  • How is pricing for Voxtral Mini Transcribe handled on LLM.API?

    Voxtral Mini Transcribe is billed per unit of processed audio, with exact per-minute or per-second rates defined in the LLM.API pricing page.

  • How does Voxtral Mini Transcribe compare to larger transcription models?

    Compared to larger models, Voxtral Mini Transcribe generally offers lower cost and latency at the expense of slightly lower accuracy on challenging audio.

  • Does Voxtral Mini Transcribe support streaming transcription on LLM.API?

    If enabled by LLM.API, you can stream audio chunks to Voxtral Mini Transcribe and receive partial transcripts incrementally.

  • What limitations should I be aware of when using Voxtral Mini Transcribe?

    Voxtral Mini Transcribe may struggle with heavy background noise, strong accents, overlapping speakers, low-bitrate audio, or domain-specific jargon without adaptation.

Start in 2 lines of code

Get My API Key