Kokoro 82M

Text-to-Speech

Kokoro 82M is an open-weight, 82‑million‑parameter text‑to‑speech model from hexgrad that focuses on natural, multilingual speech with low latency and low resource usage. It is notable for delivering high‑quality audio comparable to much larger TTS systems while remaining lightweight and easy to run locally or on-device.

Start Using API

API Performance

Latency: ~0.5s avg generation time for short sentences on GPU/modern CPU
Context: ~60s max practical audio duration per call
Input: Free per 1M characters (self-hosted, open-source Apache-2.0)
Output: Free per hour of audio (self-hosted, open-source Apache-2.0)
Uptime: 99% 99%

About the model

What is Kokoro 82M?

Kokoro 82M is a compact, open-source text-to-speech model with 82 million parameters released by hexgrad for efficient, high-quality speech generation. It is primarily used to synthesize natural-sounding speech for applications like voice assistants, content narration, and accessibility tools where low latency and local deployment are important. It is also adopted in developer tooling and pipelines (Python, JavaScript/ONNX, mobile, and edge deployments) to provide multilingual voices without relying on paid cloud APIs. Kokoro 82M belongs to the Kokoro model family and underpins the broader Kokoro TTS library and later Kokoro v1.0 upgrades.

Input / Output

Input

Text prompts (for text-to-speech)

Output

Audio speech output (e.g. MP3, WAV via TTS APIs)

Model capabilities

5 Core Capabilities

Text To Speech

Generates high-quality, natural-sounding speech audio from text using a lightweight, open-weight TTS architecture optimized for efficiency.
Multilingual Voices

Provides many voices and multiple language options, enabling diverse speech outputs for global applications and varied user preferences.
Expressive Prosody

Produces expressive, human-like intonation and rhythm, suitable for audiobooks, dialogue, narration, and other natural-sounding voice experiences.
Real-Time Inference

Runs efficiently on consumer hardware with fast inference speeds, supporting near real-time text-to-speech generation in applications.
Offline Deployment

Supports fully local, offline deployment with open weights, enabling privacy-preserving speech synthesis without external API dependencies.

Use cases

6 Most Valuable Use Cases

Multilingual Text Narration
Audiobook Generation
Customer Support Voicebots
Real-Time Voice Feedback
Marketing Voiceovers
On-Device TTS Deployment

Transparent pricing

Cost Comparison

LLM API offers the lowest prices and latency for Kokoro 82M-compatible TTS workloads.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~120 chars/s	~99.99%	~$0.20/1K chars	~$0.00	~10K chars
hexgrad	Global	~180ms	~80 chars/s	~99.9%	~$0.35/1K chars	~$0.00	~8K chars
OpenAI (Speech/TTS)	Global	~250ms	~70 chars/s	~99.9%	~$0.30/1K chars	~$0.00	~8K chars
ElevenLabs	Global	~220ms	~75 chars/s	~99.9%	~$0.40/1K chars	~$0.00	~8K chars
Amazon Polly	US East	~260ms	~60 chars/s	~99.95%	~$0.35/1K chars	~$0.00	~3K chars

Performance benchmarks

Technical Specifications

Metric	Kokoro 82M (hexgrad)	Llama 3.2 1B (Meta)	Gemma 2 2B (Google)
Avg Latency	~120ms	~220ms	~250ms
Context Window	8K	8K	8K
Input Price ($/1M tokens)	$0.05	$0.10	$0.12
Output Price ($/1M tokens)	$0.10	$0.20	$0.24
Max Output Tokens	2K	4K	4K
Throughput	80 tps	60 tps	50 tps
Uptime	99.0%	99.5%	99.5%

30-day usage via LLM API

1.1B: Prompt tokens (last 30 days)
720M: Completion tokens generated
4.5M: API requests served
38.4K: Unique users

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Dynamically route each request to the best model by latency, cost, and quality. One API, pluggable providers, no client-code rewrites.
One endpoint, every model
Cost-Aware Execution

Control spend with per-request price caps, model-level policies, and transparent usage metrics so you can aggressively optimize without breaking production flows.
Optimize every token
Automatic Provider Fallback

Survive provider outages and rate limits with built-in failover logic that retries on alternate models while preserving request semantics and SLAs.
No single point of failure
Full-Stack Observability

Get traces, logs, and metrics for every LLM call—across providers—so you can debug latency, track failures, and tune prompts with production-grade visibility.
See every token flow
Task-Level Orchestration

Define higher-level tasks instead of wiring individual calls. LLM.API handles tools, retries, and multi-step workflows as a single, composable abstraction.
Think tasks, not calls
High-Throughput Batch

Submit massive batches of generations, embeddings, or tool calls in one request, with concurrency controls and status tracking built in for large-scale workloads.
Scale jobs, not code

Decision guide

When to Use — When NOT to Use

Use it if...

You need a lightweight English TTS model suitable for on-device or embedded deployment.
Your use case involves batch-generating natural-sounding English speech from short text snippets.
You need an open-source TTS model that developers can self-host and customize freely.
Your use case involves prototyping English voice features without relying on commercial cloud APIs.
You need reasonably natural English speech where small model size is a key constraint.

Avoid if...

You need multilingual TTS covering many languages and accents beyond standard English support.
Your workload requires ultra-high-fidelity, human-indistinguishable voices for premium production audio.
You need built-in text understanding, dialogue, or reasoning beyond straightforward text-to-speech conversion.
Your workload requires highly expressive emotional prosody and fine-grained controllable speaking styles.
You need enterprise-grade SLAs, support, and monitoring from a large commercial cloud provider.

FAQ

Frequently Asked Questions

What is Kokoro 82M?

Kokoro 82M is a compact 82M-parameter text generation model by hexgrad focused on fast, low-cost inference for English-only tasks.
What is Kokoro 82M best suited for?

Kokoro 82M is best for lightweight tasks like short-form generation, classification, routing, and simple agents where latency and cost are critical.
What is the context window of Kokoro 82M on LLM.API?

On LLM.API, Kokoro 82M supports a 4K token context window for prompts plus generated output combined.
How fast is Kokoro 82M in terms of latency?

Due to its small size, Kokoro 82M typically returns the first tokens in under a second for short prompts, depending on load and network conditions.
What does Kokoro 82M cost to use via LLM.API?

Kokoro 82M is priced in the lowest LLM.API billing tier, making it significantly cheaper per 1,000 tokens than larger general-purpose models.
Which modalities does Kokoro 82M support?

Kokoro 82M is a text-only model, supporting text input and text output without image, audio, or other modalities.
How do I call Kokoro 82M through the LLM.API gateway?

Use the standard chat or completion endpoint and set the model field to "hexgrad/kokoro-82m" in your LLM.API request.
How does Kokoro 82M compare to larger models on LLM.API?

Kokoro 82M is much cheaper and faster than larger models but generally provides weaker reasoning, coding, and long-form generation quality.
What are the main limitations of Kokoro 82M?

Kokoro 82M struggles with complex reasoning, long-context coherence, detailed code synthesis, and creative writing compared to larger-scale models.
Can Kokoro 82M handle streaming responses on LLM.API?

Yes, Kokoro 82M supports token streaming via LLM.API by enabling the stream flag in your request.

Start in 2 lines of code

Get My API Key

Kokoro 82M

What is Kokoro 82M?

5 Core Capabilities

Text To Speech

Multilingual Voices

Expressive Prosody

Real-Time Inference

Offline Deployment

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Execution

Automatic Provider Fallback

Full-Stack Observability

Task-Level Orchestration

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code