CSM 1B

Text Generation

CSM 1B is a 1‑billion‑parameter conversational speech model from Sesame that turns text (and optionally audio context) into natural‑sounding English speech. It is notable for its Llama-based architecture and RVQ/Mimi audio code generation that enables high‑quality, low‑latency voice output.

Start Using API

API Performance

Latency: ~0.8s avg response
Context: ~8K token context
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is CSM 1B?

CSM 1B is a 1‑billion‑parameter conversational speech generation model from Sesame that converts text into English speech. It is mainly used for dialogue-oriented applications such as voice assistants, interactive agents, and chat-style voice interfaces that need realistic, contextual speech output. It is also applied in text-to-speech pipelines for content creation, accessibility tools, and other products that require controllable, high‑fidelity synthetic voices. CSM 1B belongs to Sesame’s CSM (Conversational Speech Model) family, which uses a Llama backbone with a specialized audio decoder that generates RVQ/Mimi audio codes.

Input / Output

Input

Text prompts (for text-to-speech generation)
Audio prompts (for speech-conditioned generation)

Output

Generated speech audio (RVQ/Mimi codec, text-to-speech)

Model capabilities

5 Core Capabilities

Text-to-Speech

Generates high-quality, natural-sounding speech audio directly from text using a Llama-based backbone and specialized audio decoder.
Conversational Prosody

Maintains contextual awareness across turns, adjusting tone, pauses, and inflection to match dialogue flow and emotional nuance.
Multimodal Inputs

Processes both text and audio inputs, using prior audio context to guide consistent speech patterns and expressive delivery.
Streaming Generation

Supports efficient, low-latency speech synthesis suitable for real-time or interactive applications and local deployment scenarios.
Multilingual Potential

Can be fine-tuned on additional languages, enabling customized voices and language support beyond the base English-focused model.

Use cases

6 Most Valuable Use Cases

Voice Customer Support
Interactive Voice Assistants
Audiobook Narration
Game Character Voices
Voice Prototyping Tools
Accessibility Voice Output

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance for CSM 1B-class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
Sesame BEST	Global	~120ms	~120 tps	99.99%	$0.08	$0.08	64K tokens
Sesame	Global	~220ms	~60 tps	~99.9%	~$0.15	~$0.15	~32K tokens
OpenRouter	Global	~250ms	~55 tps	~99.9%	~$0.18	~$0.18	~32K tokens
Replicate	US East	~280ms	~45 tps	~99.5%	~$0.20	~$0.20	~16K tokens

Performance benchmarks

Technical Specifications

Metric	CSM 1B (Sesame)	Llama 3.2 1B (Meta)	Phi-3 Mini 1.1B (Microsoft)
Avg Latency	~220ms	~250ms	~260ms
Context Window	~32K	8K	8K
Input Price ($/1M tokens)	~$0.05	~$0.10	~$0.05
Output Price ($/1M tokens)	~$0.15	~$0.30	~$0.20
Max Output Tokens	~4K	2K	2K
Throughput	~60 tps	~40 tps	~35 tps
Uptime	~99.9%	~99.9%	~99.9%

30-day usage via LLM API

620M: Prompt tokens processed (30 days)
410M: Completion tokens generated (30 days)
3.1M: API requests served (30 days)
99.6%: Avg uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Dynamically route each request to the best-fit model across providers based on latency, accuracy, or cost—without changing your integration or redeploying code.
One endpoint, every model
Smart Cost Controls

Optimize spend with per-route price caps, automatic model downgrades, and detailed usage insights so you can keep quality high while staying within budget.
Max quality, min cost
Automatic Provider Fallbacks

Avoid downtime by failing over to alternative models or providers on errors, rate limits, or outages—configured once, enforced globally in real time.
Resilience by default
End-to-End Observability

Trace every call across providers with request logs, latency breakdowns, errors, and cost metrics so you can debug faster and tune performance with confidence.
See every token
Task-Aware Abstractions

Use high-level tasks—chat, tools, RAG, agents—instead of provider-specific APIs, so you can upgrade models or vendors without rewriting your application logic.
Code to tasks, not APIs
High-Throughput Batch APIs

Run large-scale jobs—evaluations, backfills, fine-tuning prep—through a single batch interface with concurrency, retries, and throttling handled by the platform.
Ship bulk workloads fast

Decision guide

When to Use — When NOT to Use

Use it if...

You need a compact vision-language model for on-device or edge deployment scenarios.
You need to classify or tag large volumes of images cost-effectively.
Your use case involves extracting simple visual attributes or objects from images.
Your use case involves lightweight multimodal experimentation before scaling to larger Sesame models.
You need a small model to fine-tune for domain-specific visual recognition.

Avoid if...

You need state-of-the-art reasoning over complex documents, diagrams, and mixed long-context inputs.
Your workload requires highly accurate natural language generation beyond short captions or labels.
You need top-tier performance on intricate multimodal benchmarks or safety-critical decisions.
Your workload requires robust handling of very high-resolution images without aggressive downscaling.
You need broad multilingual understanding and generation, not just basic English-centric capabilities.

FAQ

Frequently Asked Questions

What is CSM 1B?

CSM 1B is a 1-billion-parameter language model from Sesame, accessible through LLM.API for lightweight, cost-efficient text generation and understanding.
What tasks is CSM 1B best suited for?

CSM 1B is best for lightweight chatbots, autocomplete, short-form content generation, and simple classification or extraction tasks where low cost matters.
What is the context window of CSM 1B?

CSM 1B supports a 4K-token context window, making it suitable for short conversations, prompts, and small documents.
What modalities does CSM 1B support?

CSM 1B is a text-only model, accepting text prompts and returning text completions without image, audio, or video support.
How do I call CSM 1B via LLM.API?

Use the LLM.API completions or chat endpoint with the model parameter set to "Sesame/CSM-1B" and include your usual authorization header.
How does CSM 1B compare to larger Sesame models?

CSM 1B is cheaper and faster but generally less capable on complex reasoning, long-context, and nuanced instruction-following than larger Sesame models.
What are the typical latency characteristics of CSM 1B on LLM.API?

CSM 1B is optimized for low latency, typically returning first tokens faster than larger models at similar throughput settings.
What are the limitations of CSM 1B?

CSM 1B may struggle with long multi-step reasoning, very domain-specific technical tasks, and maintaining consistency over extended dialogs.
Does CSM 1B support streaming responses on LLM.API?

Yes, you can enable streaming in LLM.API requests to receive CSM 1B tokens incrementally as they are generated.
How is CSM 1B priced on LLM.API?

CSM 1B is priced as a budget-friendly tier on LLM.API, with lower per-token costs than larger Sesame and frontier models.

Start in 2 lines of code

Get My API Key

CSM 1B

What is CSM 1B?

5 Core Capabilities

Text-to-Speech

Conversational Prosody

Multimodal Inputs

Streaming Generation

Multilingual Potential

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Smart Cost Controls

Automatic Provider Fallbacks

End-to-End Observability

Task-Aware Abstractions

High-Throughput Batch APIs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code