Powered by hexgrad
Kokoro 82M
- Text-to-Speech
Kokoro 82M is an open-weight, 82‑million‑parameter text‑to‑speech model from hexgrad that focuses on natural, multilingual speech with low latency and low resource usage. It is notable for delivering high‑quality audio comparable to much larger TTS systems while remaining lightweight and easy to run locally or on-device.
About the model
What is Kokoro 82M?
Kokoro 82M is a compact, open-source text-to-speech model with 82 million parameters released by hexgrad for efficient, high-quality speech generation. It is primarily used to synthesize natural-sounding speech for applications like voice assistants, content narration, and accessibility tools where low latency and local deployment are important. It is also adopted in developer tooling and pipelines (Python, JavaScript/ONNX, mobile, and edge deployments) to provide multilingual voices without relying on paid cloud APIs. Kokoro 82M belongs to the Kokoro model family and underpins the broader Kokoro TTS library and later Kokoro v1.0 upgrades.
Model capabilities
5 Core Capabilities
-
Text To Speech
Generates high-quality, natural-sounding speech audio from text using a lightweight, open-weight TTS architecture optimized for efficiency.
-
Multilingual Voices
Provides many voices and multiple language options, enabling diverse speech outputs for global applications and varied user preferences.
-
Expressive Prosody
Produces expressive, human-like intonation and rhythm, suitable for audiobooks, dialogue, narration, and other natural-sounding voice experiences.
-
Real-Time Inference
Runs efficiently on consumer hardware with fast inference speeds, supporting near real-time text-to-speech generation in applications.
-
Offline Deployment
Supports fully local, offline deployment with open weights, enabling privacy-preserving speech synthesis without external API dependencies.
Use cases
6 Most Valuable Use Cases
- Multilingual Text Narration
- Audiobook Generation
- Customer Support Voicebots
- Real-Time Voice Feedback
- Marketing Voiceovers
- On-Device TTS Deployment
Transparent pricing
Cost Comparison
LLM API offers the lowest prices and latency for Kokoro 82M-compatible TTS workloads.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~120 chars/s | ~99.99% | ~$0.20/1K chars | ~$0.00 | ~10K chars |
| hexgrad | Global | ~180ms | ~80 chars/s | ~99.9% | ~$0.35/1K chars | ~$0.00 | ~8K chars |
| OpenAI (Speech/TTS) | Global | ~250ms | ~70 chars/s | ~99.9% | ~$0.30/1K chars | ~$0.00 | ~8K chars |
| ElevenLabs | Global | ~220ms | ~75 chars/s | ~99.9% | ~$0.40/1K chars | ~$0.00 | ~8K chars |
| Amazon Polly | US East | ~260ms | ~60 chars/s | ~99.95% | ~$0.35/1K chars | ~$0.00 | ~3K chars |
Performance benchmarks
Technical Specifications
| Metric | Kokoro 82M (hexgrad) | Llama 3.2 1B (Meta) | Gemma 2 2B (Google) |
|---|---|---|---|
| Avg Latency | ~120ms | ~220ms | ~250ms |
| Context Window | 8K | 8K | 8K |
| Input Price ($/1M tokens) | $0.05 | $0.10 | $0.12 |
| Output Price ($/1M tokens) | $0.10 | $0.20 | $0.24 |
| Max Output Tokens | 2K | 4K | 4K |
| Throughput | 80 tps | 60 tps | 50 tps |
| Uptime | 99.0% | 99.5% | 99.5% |
30-day usage via LLM API
- 1.1B
- Prompt tokens (last 30 days)
- 720M
- Completion tokens generated
- 4.5M
- API requests served
- 38.4K
- Unique users
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Dynamically route each request to the best model by latency, cost, and quality. One API, pluggable providers, no client-code rewrites.
One endpoint, every model -
Cost-Aware Execution
Control spend with per-request price caps, model-level policies, and transparent usage metrics so you can aggressively optimize without breaking production flows.
Optimize every token -
Automatic Provider Fallback
Survive provider outages and rate limits with built-in failover logic that retries on alternate models while preserving request semantics and SLAs.
No single point of failure -
Full-Stack Observability
Get traces, logs, and metrics for every LLM call—across providers—so you can debug latency, track failures, and tune prompts with production-grade visibility.
See every token flow -
Task-Level Orchestration
Define higher-level tasks instead of wiring individual calls. LLM.API handles tools, retries, and multi-step workflows as a single, composable abstraction.
Think tasks, not calls -
High-Throughput Batch
Submit massive batches of generations, embeddings, or tool calls in one request, with concurrency controls and status tracking built in for large-scale workloads.
Scale jobs, not code
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a lightweight English TTS model suitable for on-device or embedded deployment.
- Your use case involves batch-generating natural-sounding English speech from short text snippets.
- You need an open-source TTS model that developers can self-host and customize freely.
- Your use case involves prototyping English voice features without relying on commercial cloud APIs.
- You need reasonably natural English speech where small model size is a key constraint.
Avoid if...
- You need multilingual TTS covering many languages and accents beyond standard English support.
- Your workload requires ultra-high-fidelity, human-indistinguishable voices for premium production audio.
- You need built-in text understanding, dialogue, or reasoning beyond straightforward text-to-speech conversion.
- Your workload requires highly expressive emotional prosody and fine-grained controllable speaking styles.
- You need enterprise-grade SLAs, support, and monitoring from a large commercial cloud provider.
FAQ
Frequently Asked Questions
-
What is Kokoro 82M?
Kokoro 82M is a compact 82M-parameter text generation model by hexgrad focused on fast, low-cost inference for English-only tasks.
-
What is Kokoro 82M best suited for?
Kokoro 82M is best for lightweight tasks like short-form generation, classification, routing, and simple agents where latency and cost are critical.
-
What is the context window of Kokoro 82M on LLM.API?
On LLM.API, Kokoro 82M supports a 4K token context window for prompts plus generated output combined.
-
How fast is Kokoro 82M in terms of latency?
Due to its small size, Kokoro 82M typically returns the first tokens in under a second for short prompts, depending on load and network conditions.
-
What does Kokoro 82M cost to use via LLM.API?
Kokoro 82M is priced in the lowest LLM.API billing tier, making it significantly cheaper per 1,000 tokens than larger general-purpose models.
-
Which modalities does Kokoro 82M support?
Kokoro 82M is a text-only model, supporting text input and text output without image, audio, or other modalities.
-
How do I call Kokoro 82M through the LLM.API gateway?
Use the standard chat or completion endpoint and set the model field to "hexgrad/kokoro-82m" in your LLM.API request.
-
How does Kokoro 82M compare to larger models on LLM.API?
Kokoro 82M is much cheaper and faster than larger models but generally provides weaker reasoning, coding, and long-form generation quality.
-
What are the main limitations of Kokoro 82M?
Kokoro 82M struggles with complex reasoning, long-context coherence, detailed code synthesis, and creative writing compared to larger-scale models.
-
Can Kokoro 82M handle streaming responses on LLM.API?
Yes, Kokoro 82M supports token streaming via LLM.API by enabling the stream flag in your request.
