Powered by Sesame
CSM 1B
- Text Generation
CSM 1B is a 1‑billion‑parameter conversational speech model from Sesame that turns text (and optionally audio context) into natural‑sounding English speech. It is notable for its Llama-based architecture and RVQ/Mimi audio code generation that enables high‑quality, low‑latency voice output.
About the model
What is CSM 1B?
CSM 1B is a 1‑billion‑parameter conversational speech generation model from Sesame that converts text into English speech. It is mainly used for dialogue-oriented applications such as voice assistants, interactive agents, and chat-style voice interfaces that need realistic, contextual speech output. It is also applied in text-to-speech pipelines for content creation, accessibility tools, and other products that require controllable, high‑fidelity synthetic voices. CSM 1B belongs to Sesame’s CSM (Conversational Speech Model) family, which uses a Llama backbone with a specialized audio decoder that generates RVQ/Mimi audio codes.
Model capabilities
5 Core Capabilities
-
Text-to-Speech
Generates high-quality, natural-sounding speech audio directly from text using a Llama-based backbone and specialized audio decoder.
-
Conversational Prosody
Maintains contextual awareness across turns, adjusting tone, pauses, and inflection to match dialogue flow and emotional nuance.
-
Multimodal Inputs
Processes both text and audio inputs, using prior audio context to guide consistent speech patterns and expressive delivery.
-
Streaming Generation
Supports efficient, low-latency speech synthesis suitable for real-time or interactive applications and local deployment scenarios.
-
Multilingual Potential
Can be fine-tuned on additional languages, enabling customized voices and language support beyond the base English-focused model.
Use cases
6 Most Valuable Use Cases
- Voice Customer Support
- Interactive Voice Assistants
- Audiobook Narration
- Game Character Voices
- Voice Prototyping Tools
- Accessibility Voice Output
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance for CSM 1B-class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| Sesame BEST | Global | ~120ms | ~120 tps | 99.99% | $0.08 | $0.08 | 64K tokens |
| Sesame | Global | ~220ms | ~60 tps | ~99.9% | ~$0.15 | ~$0.15 | ~32K tokens |
| OpenRouter | Global | ~250ms | ~55 tps | ~99.9% | ~$0.18 | ~$0.18 | ~32K tokens |
| Replicate | US East | ~280ms | ~45 tps | ~99.5% | ~$0.20 | ~$0.20 | ~16K tokens |
Performance benchmarks
Technical Specifications
| Metric | CSM 1B (Sesame) | Llama 3.2 1B (Meta) | Phi-3 Mini 1.1B (Microsoft) |
|---|---|---|---|
| Avg Latency | ~220ms | ~250ms | ~260ms |
| Context Window | ~32K | 8K | 8K |
| Input Price ($/1M tokens) | ~$0.05 | ~$0.10 | ~$0.05 |
| Output Price ($/1M tokens) | ~$0.15 | ~$0.30 | ~$0.20 |
| Max Output Tokens | ~4K | 2K | 2K |
| Throughput | ~60 tps | ~40 tps | ~35 tps |
| Uptime | ~99.9% | ~99.9% | ~99.9% |
30-day usage via LLM API
- 620M
- Prompt tokens processed (30 days)
- 410M
- Completion tokens generated (30 days)
- 3.1M
- API requests served (30 days)
- 99.6%
- Avg uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Dynamically route each request to the best-fit model across providers based on latency, accuracy, or cost—without changing your integration or redeploying code.
One endpoint, every model -
Smart Cost Controls
Optimize spend with per-route price caps, automatic model downgrades, and detailed usage insights so you can keep quality high while staying within budget.
Max quality, min cost -
Automatic Provider Fallbacks
Avoid downtime by failing over to alternative models or providers on errors, rate limits, or outages—configured once, enforced globally in real time.
Resilience by default -
End-to-End Observability
Trace every call across providers with request logs, latency breakdowns, errors, and cost metrics so you can debug faster and tune performance with confidence.
See every token -
Task-Aware Abstractions
Use high-level tasks—chat, tools, RAG, agents—instead of provider-specific APIs, so you can upgrade models or vendors without rewriting your application logic.
Code to tasks, not APIs -
High-Throughput Batch APIs
Run large-scale jobs—evaluations, backfills, fine-tuning prep—through a single batch interface with concurrency, retries, and throttling handled by the platform.
Ship bulk workloads fast
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a compact vision-language model for on-device or edge deployment scenarios.
- You need to classify or tag large volumes of images cost-effectively.
- Your use case involves extracting simple visual attributes or objects from images.
- Your use case involves lightweight multimodal experimentation before scaling to larger Sesame models.
- You need a small model to fine-tune for domain-specific visual recognition.
Avoid if...
- You need state-of-the-art reasoning over complex documents, diagrams, and mixed long-context inputs.
- Your workload requires highly accurate natural language generation beyond short captions or labels.
- You need top-tier performance on intricate multimodal benchmarks or safety-critical decisions.
- Your workload requires robust handling of very high-resolution images without aggressive downscaling.
- You need broad multilingual understanding and generation, not just basic English-centric capabilities.
FAQ
Frequently Asked Questions
-
What is CSM 1B?
CSM 1B is a 1-billion-parameter language model from Sesame, accessible through LLM.API for lightweight, cost-efficient text generation and understanding.
-
What tasks is CSM 1B best suited for?
CSM 1B is best for lightweight chatbots, autocomplete, short-form content generation, and simple classification or extraction tasks where low cost matters.
-
What is the context window of CSM 1B?
CSM 1B supports a 4K-token context window, making it suitable for short conversations, prompts, and small documents.
-
What modalities does CSM 1B support?
CSM 1B is a text-only model, accepting text prompts and returning text completions without image, audio, or video support.
-
How do I call CSM 1B via LLM.API?
Use the LLM.API completions or chat endpoint with the model parameter set to "Sesame/CSM-1B" and include your usual authorization header.
-
How does CSM 1B compare to larger Sesame models?
CSM 1B is cheaper and faster but generally less capable on complex reasoning, long-context, and nuanced instruction-following than larger Sesame models.
-
What are the typical latency characteristics of CSM 1B on LLM.API?
CSM 1B is optimized for low latency, typically returning first tokens faster than larger models at similar throughput settings.
-
What are the limitations of CSM 1B?
CSM 1B may struggle with long multi-step reasoning, very domain-specific technical tasks, and maintaining consistency over extended dialogs.
-
Does CSM 1B support streaming responses on LLM.API?
Yes, you can enable streaming in LLM.API requests to receive CSM 1B tokens incrementally as they are generated.
-
How is CSM 1B priced on LLM.API?
CSM 1B is priced as a budget-friendly tier on LLM.API, with lower per-token costs than larger Sesame and frontier models.
