Powered by hexgrad

Kokoro 82M

  • Text-to-Speech

Kokoro 82M is an open-weight, 82‑million‑parameter text‑to‑speech model from hexgrad that focuses on natural, multilingual speech with low latency and low resource usage. It is notable for delivering high‑quality audio comparable to much larger TTS systems while remaining lightweight and easy to run locally or on-device.

Start Using API

What is Kokoro 82M?

Kokoro 82M is a compact, open-source text-to-speech model with 82 million parameters released by hexgrad for efficient, high-quality speech generation. It is primarily used to synthesize natural-sounding speech for applications like voice assistants, content narration, and accessibility tools where low latency and local deployment are important. It is also adopted in developer tooling and pipelines (Python, JavaScript/ONNX, mobile, and edge deployments) to provide multilingual voices without relying on paid cloud APIs. Kokoro 82M belongs to the Kokoro model family and underpins the broader Kokoro TTS library and later Kokoro v1.0 upgrades.

5 Core Capabilities

  • Text To Speech

    Generates high-quality, natural-sounding speech audio from text using a lightweight, open-weight TTS architecture optimized for efficiency.

  • Multilingual Voices

    Provides many voices and multiple language options, enabling diverse speech outputs for global applications and varied user preferences.

  • Expressive Prosody

    Produces expressive, human-like intonation and rhythm, suitable for audiobooks, dialogue, narration, and other natural-sounding voice experiences.

  • Real-Time Inference

    Runs efficiently on consumer hardware with fast inference speeds, supporting near real-time text-to-speech generation in applications.

  • Offline Deployment

    Supports fully local, offline deployment with open weights, enabling privacy-preserving speech synthesis without external API dependencies.

6 Most Valuable Use Cases

  • Multilingual Text Narration
  • Audiobook Generation
  • Customer Support Voicebots
  • Real-Time Voice Feedback
  • Marketing Voiceovers
  • On-Device TTS Deployment

Cost Comparison

LLM API offers the lowest prices and latency for Kokoro 82M-compatible TTS workloads.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~120ms ~120 chars/s ~99.99% ~$0.20/1K chars ~$0.00 ~10K chars
hexgrad Global ~180ms ~80 chars/s ~99.9% ~$0.35/1K chars ~$0.00 ~8K chars
OpenAI (Speech/TTS) Global ~250ms ~70 chars/s ~99.9% ~$0.30/1K chars ~$0.00 ~8K chars
ElevenLabs Global ~220ms ~75 chars/s ~99.9% ~$0.40/1K chars ~$0.00 ~8K chars
Amazon Polly US East ~260ms ~60 chars/s ~99.95% ~$0.35/1K chars ~$0.00 ~3K chars

Technical Specifications

Metric Kokoro 82M (hexgrad) Llama 3.2 1B (Meta) Gemma 2 2B (Google)
Avg Latency ~120ms ~220ms ~250ms
Context Window 8K 8K 8K
Input Price ($/1M tokens) $0.05 $0.10 $0.12
Output Price ($/1M tokens) $0.10 $0.20 $0.24
Max Output Tokens 2K 4K 4K
Throughput 80 tps 60 tps 50 tps
Uptime 99.0% 99.5% 99.5%

30-day usage via LLM API

1.1B
Prompt tokens (last 30 days)
720M
Completion tokens generated
4.5M
API requests served
38.4K
Unique users
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Intelligent Model Routing

    Dynamically route each request to the best model by latency, cost, and quality. One API, pluggable providers, no client-code rewrites.

    One endpoint, every model
  • Cost-Aware Execution

    Control spend with per-request price caps, model-level policies, and transparent usage metrics so you can aggressively optimize without breaking production flows.

    Optimize every token
  • Automatic Provider Fallback

    Survive provider outages and rate limits with built-in failover logic that retries on alternate models while preserving request semantics and SLAs.

    No single point of failure
  • Full-Stack Observability

    Get traces, logs, and metrics for every LLM call—across providers—so you can debug latency, track failures, and tune prompts with production-grade visibility.

    See every token flow
  • Task-Level Orchestration

    Define higher-level tasks instead of wiring individual calls. LLM.API handles tools, retries, and multi-step workflows as a single, composable abstraction.

    Think tasks, not calls
  • High-Throughput Batch

    Submit massive batches of generations, embeddings, or tool calls in one request, with concurrency controls and status tracking built in for large-scale workloads.

    Scale jobs, not code

When to Use — When NOT to Use

Use it if...

  • You need a lightweight English TTS model suitable for on-device or embedded deployment.
  • Your use case involves batch-generating natural-sounding English speech from short text snippets.
  • You need an open-source TTS model that developers can self-host and customize freely.
  • Your use case involves prototyping English voice features without relying on commercial cloud APIs.
  • You need reasonably natural English speech where small model size is a key constraint.

Avoid if...

  • You need multilingual TTS covering many languages and accents beyond standard English support.
  • Your workload requires ultra-high-fidelity, human-indistinguishable voices for premium production audio.
  • You need built-in text understanding, dialogue, or reasoning beyond straightforward text-to-speech conversion.
  • Your workload requires highly expressive emotional prosody and fine-grained controllable speaking styles.
  • You need enterprise-grade SLAs, support, and monitoring from a large commercial cloud provider.

Frequently Asked Questions

  • What is Kokoro 82M?

    Kokoro 82M is a compact 82M-parameter text generation model by hexgrad focused on fast, low-cost inference for English-only tasks.

  • What is Kokoro 82M best suited for?

    Kokoro 82M is best for lightweight tasks like short-form generation, classification, routing, and simple agents where latency and cost are critical.

  • What is the context window of Kokoro 82M on LLM.API?

    On LLM.API, Kokoro 82M supports a 4K token context window for prompts plus generated output combined.

  • How fast is Kokoro 82M in terms of latency?

    Due to its small size, Kokoro 82M typically returns the first tokens in under a second for short prompts, depending on load and network conditions.

  • What does Kokoro 82M cost to use via LLM.API?

    Kokoro 82M is priced in the lowest LLM.API billing tier, making it significantly cheaper per 1,000 tokens than larger general-purpose models.

  • Which modalities does Kokoro 82M support?

    Kokoro 82M is a text-only model, supporting text input and text output without image, audio, or other modalities.

  • How do I call Kokoro 82M through the LLM.API gateway?

    Use the standard chat or completion endpoint and set the model field to "hexgrad/kokoro-82m" in your LLM.API request.

  • How does Kokoro 82M compare to larger models on LLM.API?

    Kokoro 82M is much cheaper and faster than larger models but generally provides weaker reasoning, coding, and long-form generation quality.

  • What are the main limitations of Kokoro 82M?

    Kokoro 82M struggles with complex reasoning, long-context coherence, detailed code synthesis, and creative writing compared to larger-scale models.

  • Can Kokoro 82M handle streaming responses on LLM.API?

    Yes, Kokoro 82M supports token streaming via LLM.API by enabling the stream flag in your request.

Start in 2 lines of code

Get My API Key