Powered by Mistral

Voxtral Small 24B 2507

  • Instruction Following

Voxtral Small 24B 2507 is a 24-billion-parameter audio-language model from Mistral that extends Mistral Small 3 with advanced speech understanding. It is notable for strong, cost-efficient performance on transcription, translation, and audio-informed text tasks across multiple languages.

Start Using API

What is Voxtral Small 24B 2507?

Voxtral Small 24B 2507 is an open-source speech understanding and language model from Mistral that combines text generation with state-of-the-art audio input capabilities. It is mainly used for high-quality speech transcription and translation directly from audio in many languages. It is also applied to tasks like Q&A, summarization, and general chat where audio context must be understood alongside text. It belongs to the Voxtral family and is built as an enhancement of the Mistral Small 3 series.

5 Core Capabilities

  • Conversational Chat

    Handles multi-turn text conversations with strong general reasoning, instruction following, and tool-use support in many domains and scenarios.

  • Audio Transcription

    Transcribes spoken audio into accurate text using a dedicated speech transcription mode optimized for high-quality automatic speech recognition.

  • Speech Translation

    Performs speech-to-text translation across multiple languages, enabling multilingual audio translation and cross-lingual understanding within one model.

  • Audio Understanding

    Analyzes audio beyond transcription, supporting audio-based question answering, summarization, and semantic comprehension of spoken content.

  • Monitoring Integration

    Integrates with various inference and observability platforms, supporting structured outputs, tools, and deployment in managed environments.

6 Most Valuable Use Cases

  • Meeting Transcription
  • Multilingual Speech Translation
  • Audio-Based Q&A
  • Voice-Driven Function Calling
  • Call Center Analytics
  • Long Audio Summarization

Cost Comparison

LLM API offers the lowest cost and best performance for Voxtral Small–class 24B models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 90ms 80 tps 99.99% $0.40 $0.40 256K
Mistral EU West ~140ms ~45 tps 99.9% ~$0.60 ~$0.60 ~128K
Together AI US East ~160ms ~40 tps 99.9% ~$0.55 ~$0.55 ~128K
Fireworks AI US West ~150ms ~42 tps 99.9% ~$0.58 ~$0.58 ~200K

Technical Specifications

Metric Voxtral Small 24B 2507 (Mistral) Mistral Large 2 123B GPT-4.1 Mini
Avg Latency ~220ms ~280ms ~180ms
Context Window 128K 128K 128K
Input Price ($/1M) $0.40 $2.00 $0.15
Output Price ($/1M) $1.20 $6.00 $0.60
Max Output Tokens 4K 4K 4K
Throughput ~60 tps ~40 tps ~80 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

2.8B
Prompt tokens processed (last 30 days)
2.1B
Completion tokens generated (last 30 days)
3.6M
API requests served (last 30 days)
99.8%
Average uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Dynamically route each request to the best model based on latency, cost, and capabilities—using one stable API contract instead of per-provider plumbing.

    One endpoint, every model.
  • Cost-Aware Orchestration

    Automatically balance price and quality with configurable cost ceilings, tiered model selection, and real-time usage controls so you never get surprised by your AI bill.

    Control spend by design.
  • Resilient Fallbacks

    Define automatic failover chains across providers and models. Survive outages and rate limits without rewriting application logic or degrading user experience.

    Never fail on 500s
  • End-to-End Observability

    Trace every request with structured logs, metrics, and latency breakdowns by provider and model. Debug production issues and tune routes using real data.

    See every token
  • Task-Level Abstractions

    Describe high-level tasks like chat, extraction, or generation once. LLM.API maps them to best-fit models, so you decouple product logic from vendors.

    Think tasks, not models
  • High-Throughput Batch APIs

    Process thousands of prompts in parallel with backpressure, retries, and cost controls built in. Ideal for reindexing, evaluations, and large content migrations.

    Scale jobs, not code

When to Use — When NOT to Use

Use it if...

  • You need a capable general-purpose LLM from Mistral without paying flagship-model prices.
  • You need solid coding assistance, code explanation, and moderate debugging across common languages.
  • Your use case involves chatbots, agents, or copilots that handle everyday business questions.
  • You need good English writing help for emails, documentation, specs, or internal reports.
  • Your use case involves moderate-length context tasks, like analyzing single documents or conversations.
  • You need an open-weight or API-accessible model compatible with common Mistral tooling.

Avoid if...

  • You need top-tier reasoning quality approaching frontier models for complex, high-stakes decisions.
  • Your workload requires extremely long-context processing, like hundreds of pages per request.
  • You need heavily optimized multimodal capabilities, such as advanced vision or audio understanding.
  • Your workload requires state-of-the-art code generation for very large or safety-critical codebases.
  • You need the absolute best Mistral model performance regardless of higher cost or resource use.
  • Your workload requires ultra-low latency inference on very small edge devices with minimal memory.

Frequently Asked Questions

  • What is Voxtral Small 24B 2507?

    Voxtral Small 24B 2507 is a 24B-parameter Mistral model exposed via LLM.API, targeting high-quality, general-purpose text generation for developers.

  • What modalities does Voxtral Small 24B 2507 support?

    Voxtral Small 24B 2507 is a text-only language model, supporting text input and text output through the LLM.API endpoints.

  • How is Voxtral Small 24B 2507 priced on LLM.API?

    Voxtral Small 24B 2507 uses LLM.API’s unified per-token pricing; check your LLM.API dashboard or pricing docs for current input and output rates.

  • What is the context window of Voxtral Small 24B 2507?

    Voxtral Small 24B 2507 supports a multi‑kilotoken context window suitable for long prompts and conversations; see the LLM.API model card for exact limits.

  • How fast is Voxtral Small 24B 2507 on LLM.API?

    Voxtral Small 24B 2507 is optimized for low-latency inference with streaming responses, but actual speed depends on prompt length and concurrency.

  • What is Voxtral Small 24B 2507 best suited for?

    Voxtral Small 24B 2507 is best for general chat, code assistance, reasoning over medium-length documents, and building production assistants with predictable cost.

  • How do I call Voxtral Small 24B 2507 via LLM.API?

    Use the standard LLM.API chat or completions endpoint and set the model field to "Voxtral Small 24B 2507" in your request payload.

  • How does Voxtral Small 24B 2507 compare to similar-sized models?

    Voxtral Small 24B 2507 targets a balance of quality and throughput comparable to other ~20–30B open models, exposed under a unified LLM.API interface.

  • What are the main limitations of Voxtral Small 24B 2507?

    Voxtral Small 24B 2507 can hallucinate, lacks real-time knowledge or browsing, and should not be used as a sole source for critical decisions.

  • Can Voxtral Small 24B 2507 handle tools or function calling via LLM.API?

    If enabled in LLM.API, Voxtral Small 24B 2507 can follow tool or function-calling schemas, but behavior depends on your request format and routing configuration.

Start in 2 lines of code

Get My API Key