Powered by Sesame

CSM 1B

  • Text Generation

CSM 1B is a 1‑billion‑parameter conversational speech model from Sesame that turns text (and optionally audio context) into natural‑sounding English speech. It is notable for its Llama-based architecture and RVQ/Mimi audio code generation that enables high‑quality, low‑latency voice output.

Start Using API

What is CSM 1B?

CSM 1B is a 1‑billion‑parameter conversational speech generation model from Sesame that converts text into English speech. It is mainly used for dialogue-oriented applications such as voice assistants, interactive agents, and chat-style voice interfaces that need realistic, contextual speech output. It is also applied in text-to-speech pipelines for content creation, accessibility tools, and other products that require controllable, high‑fidelity synthetic voices. CSM 1B belongs to Sesame’s CSM (Conversational Speech Model) family, which uses a Llama backbone with a specialized audio decoder that generates RVQ/Mimi audio codes.

5 Core Capabilities

  • Text-to-Speech

    Generates high-quality, natural-sounding speech audio directly from text using a Llama-based backbone and specialized audio decoder.

  • Conversational Prosody

    Maintains contextual awareness across turns, adjusting tone, pauses, and inflection to match dialogue flow and emotional nuance.

  • Multimodal Inputs

    Processes both text and audio inputs, using prior audio context to guide consistent speech patterns and expressive delivery.

  • Streaming Generation

    Supports efficient, low-latency speech synthesis suitable for real-time or interactive applications and local deployment scenarios.

  • Multilingual Potential

    Can be fine-tuned on additional languages, enabling customized voices and language support beyond the base English-focused model.

6 Most Valuable Use Cases

  • Voice Customer Support
  • Interactive Voice Assistants
  • Audiobook Narration
  • Game Character Voices
  • Voice Prototyping Tools
  • Accessibility Voice Output

Cost Comparison

LLM API offers the lowest cost and highest performance for CSM 1B-class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
Sesame BEST Global ~120ms ~120 tps 99.99% $0.08 $0.08 64K tokens
Sesame Global ~220ms ~60 tps ~99.9% ~$0.15 ~$0.15 ~32K tokens
OpenRouter Global ~250ms ~55 tps ~99.9% ~$0.18 ~$0.18 ~32K tokens
Replicate US East ~280ms ~45 tps ~99.5% ~$0.20 ~$0.20 ~16K tokens

Technical Specifications

Metric CSM 1B (Sesame) Llama 3.2 1B (Meta) Phi-3 Mini 1.1B (Microsoft)
Avg Latency ~220ms ~250ms ~260ms
Context Window ~32K 8K 8K
Input Price ($/1M tokens) ~$0.05 ~$0.10 ~$0.05
Output Price ($/1M tokens) ~$0.15 ~$0.30 ~$0.20
Max Output Tokens ~4K 2K 2K
Throughput ~60 tps ~40 tps ~35 tps
Uptime ~99.9% ~99.9% ~99.9%

30-day usage via LLM API

620M
Prompt tokens processed (30 days)
410M
Completion tokens generated (30 days)
3.1M
API requests served (30 days)
99.6%
Avg uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Dynamically route each request to the best-fit model across providers based on latency, accuracy, or cost—without changing your integration or redeploying code.

    One endpoint, every model
  • Smart Cost Controls

    Optimize spend with per-route price caps, automatic model downgrades, and detailed usage insights so you can keep quality high while staying within budget.

    Max quality, min cost
  • Automatic Provider Fallbacks

    Avoid downtime by failing over to alternative models or providers on errors, rate limits, or outages—configured once, enforced globally in real time.

    Resilience by default
  • End-to-End Observability

    Trace every call across providers with request logs, latency breakdowns, errors, and cost metrics so you can debug faster and tune performance with confidence.

    See every token
  • Task-Aware Abstractions

    Use high-level tasks—chat, tools, RAG, agents—instead of provider-specific APIs, so you can upgrade models or vendors without rewriting your application logic.

    Code to tasks, not APIs
  • High-Throughput Batch APIs

    Run large-scale jobs—evaluations, backfills, fine-tuning prep—through a single batch interface with concurrency, retries, and throttling handled by the platform.

    Ship bulk workloads fast

When to Use — When NOT to Use

Use it if...

  • You need a compact vision-language model for on-device or edge deployment scenarios.
  • You need to classify or tag large volumes of images cost-effectively.
  • Your use case involves extracting simple visual attributes or objects from images.
  • Your use case involves lightweight multimodal experimentation before scaling to larger Sesame models.
  • You need a small model to fine-tune for domain-specific visual recognition.

Avoid if...

  • You need state-of-the-art reasoning over complex documents, diagrams, and mixed long-context inputs.
  • Your workload requires highly accurate natural language generation beyond short captions or labels.
  • You need top-tier performance on intricate multimodal benchmarks or safety-critical decisions.
  • Your workload requires robust handling of very high-resolution images without aggressive downscaling.
  • You need broad multilingual understanding and generation, not just basic English-centric capabilities.

Frequently Asked Questions

  • What is CSM 1B?

    CSM 1B is a 1-billion-parameter language model from Sesame, accessible through LLM.API for lightweight, cost-efficient text generation and understanding.

  • What tasks is CSM 1B best suited for?

    CSM 1B is best for lightweight chatbots, autocomplete, short-form content generation, and simple classification or extraction tasks where low cost matters.

  • What is the context window of CSM 1B?

    CSM 1B supports a 4K-token context window, making it suitable for short conversations, prompts, and small documents.

  • What modalities does CSM 1B support?

    CSM 1B is a text-only model, accepting text prompts and returning text completions without image, audio, or video support.

  • How do I call CSM 1B via LLM.API?

    Use the LLM.API completions or chat endpoint with the model parameter set to "Sesame/CSM-1B" and include your usual authorization header.

  • How does CSM 1B compare to larger Sesame models?

    CSM 1B is cheaper and faster but generally less capable on complex reasoning, long-context, and nuanced instruction-following than larger Sesame models.

  • What are the typical latency characteristics of CSM 1B on LLM.API?

    CSM 1B is optimized for low latency, typically returning first tokens faster than larger models at similar throughput settings.

  • What are the limitations of CSM 1B?

    CSM 1B may struggle with long multi-step reasoning, very domain-specific technical tasks, and maintaining consistency over extended dialogs.

  • Does CSM 1B support streaming responses on LLM.API?

    Yes, you can enable streaming in LLM.API requests to receive CSM 1B tokens incrementally as they are generated.

  • How is CSM 1B priced on LLM.API?

    CSM 1B is priced as a budget-friendly tier on LLM.API, with lower per-token costs than larger Sesame and frontier models.

Start in 2 lines of code

Get My API Key