Laguna XS.2 (free)

Text Generation

Laguna XS.2 (free) by Poolside is a compact, open‑weight agentic coding model optimized for fast, affordable software engineering workflows, available at no cost via selected providers. It combines a Mixture‑of‑Experts architecture with strong coding performance while remaining lightweight enough to run in more constrained environments.

Start Using API

API Performance

Latency: ~1.0s avg response
Context: 262K token context
Input: $0.00 per 1M tokens
Output: $0.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Laguna XS.2 (free)?

Laguna XS.2 is Poolside’s open‑weight, 33B‑parameter Mixture‑of‑Experts language model with about 3B active parameters, designed primarily for agentic coding and software engineering tasks. It is used for building coding agents that can iteratively edit code, run tools, and solve multi‑step programming tasks, and for running private or on‑premise development assistants thanks to its relatively low resource requirements. It also supports general chat-style interactions for developers through APIs and integrations such as OpenRouter and third‑party platforms offering a free preview tier. Laguna XS.2 is part of Poolside’s Laguna family of models and is a second‑generation, XS‑class successor building on the training pipeline and lessons from the larger Laguna M.1 model.

Input / Output

Input

Text prompts (natural language or code)

Output

Text responses (natural language or code)
Code snippets and programming outputs

Model capabilities

5 Core Capabilities

Conversational Chat

Supports instruction-following, multi-turn chat, and reasoning-focused assistant interactions over long contexts using a text-to-text chat interface.
Code Generation

Optimized for software engineering and agentic coding, generating and editing code, fixing bugs, and handling multi-step programming tasks.
Long-Context Reasoning

Handles up to 262k-token contexts with sliding window and global attention, enabling long-horizon reasoning and document-spanning workflows.
Tool And Function Use

Natively supports tool use and function calling, reasoning before and between tool calls for automated workflows and coding agents.
Multilingual Text

Processes and generates text in multiple languages, enabling cross-lingual chat, documentation assistance, and programming help with non-English content.

Use cases

6 Most Valuable Use Cases

Local Coding Agents
Automated Bug Fixing
Codebase Refactoring
Tool-Assisted Debugging
Software Dev Assistants
Multi-Step Code Reasoning

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency vs comparable Laguna-class APIs.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	80 tps	99.99%	$0.05	$0.10	128K
Poolside	Global	~220ms	~35 tps	~99.9%	$0.00	$0.00	~64K
OpenAI	US East	~250ms	~40 tps	99.9%	~$0.30	~$0.60	~128K
Anthropic	US West	~260ms	~30 tps	99.9%	~$0.35	~$0.80	~200K
Google Cloud	EU West	~280ms	~25 tps	99.9%	~$0.40	~$0.80	~128K

Performance benchmarks

Technical Specifications

Metric	Laguna XS.2 (free)	GPT-4o mini (OpenAI)	Claude 3.5 Haiku (Anthropic)
Avg Latency	~250ms	~300ms	~350ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.00	$0.15	$0.25
Output Price ($/1M)	$0.00	$0.60	$1.25
Max Output Tokens	4K	4K	4K
Throughput	~120 tps	~100 tps	~90 tps
Uptime	99.0%	99.9%	99.9%

30-day usage via LLM API

12.5B: Prompt tokens processed (last 30 days)
7.8B: Completion tokens generated (last 30 days)
9.3M: API requests served (last 30 days)
410K: Unique users (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent AI Routing

Automatically route each request to the optimal model across providers based on latency, capability, and cost—without changing your integration.
One endpoint, any model
Cost-Aware Orchestration

Blend premium and budget models with per-request cost controls, guardrails, and policies so teams can ship richer AI features without runaway spend.
More AI, less spend
Resilient Fallback Flows

Define automatic cross-provider fallbacks, retries, and degradations so production workloads keep working even when individual models or regions fail.
Designed for failure
Full-Stack Observability

Trace every call across providers with logs, metrics, and structured spans to debug prompts, tune routing, and meet compliance requirements.
See every token
Task-Level Abstractions

Describe tasks like chat, tools, RAG, or classification once and let LLM.API handle provider-specific prompts, parameters, and response shaping.
Tasks, not prompts
High-Throughput Batch

Submit large batches through a unified API with provider-aware chunking, concurrency control, and retries to slash latency and infrastructure overhead.
Scale to millions

Decision guide

When to Use — When NOT to Use

Use it if...

You need a free model from Poolside for experimentation or early prototyping.
Your use case involves simple Q&A, proofreading, or light content rewriting tasks.
You need a backup or fallback model to reduce overall API spending.
Your use case involves building internal tools where occasional inaccuracies are acceptable.
You need to test Poolside platform integration before committing to paid tiers.
Your use case involves short-form text generation like summaries, captions, or brief replies.

Avoid if...

You need guaranteed top-tier reasoning performance comparable to the latest frontier models.
Your workload requires highly reliable code generation, debugging, and complex software design support.
You need enterprise-grade SLAs, dedicated support, or strict performance guarantees for production systems.
Your workload requires specialized capabilities like vision, audio, tools, or very long context windows.
You need state-of-the-art performance on complex math, formal logic, or multi-step planning.
Your workload requires tight latency guarantees for real-time or user-facing critical interactions.

FAQ

Frequently Asked Questions

What is Laguna XS.2 (free)?

Laguna XS.2 (free) is a lightweight Poolside language model accessible via LLM.API, intended for general-purpose text generation and experimentation at no charge.
What modalities does Laguna XS.2 (free) support?

Laguna XS.2 (free) supports text-only input and output, without native image, audio, or video understanding capabilities.
How is Laguna XS.2 (free) priced on LLM.API?

Laguna XS.2 (free) is offered with no per-token usage fees on LLM.API, subject to platform-specific free-tier and rate limits.
What is the context window of Laguna XS.2 (free)?

Laguna XS.2 (free) supports a 16K token context window for combined input and output on LLM.API.
How fast is Laguna XS.2 (free) in terms of latency and throughput?

Laguna XS.2 (free) is optimized for low-latency responses and higher throughput than larger models, though exact speeds depend on LLM.API load and client location.
How do I call Laguna XS.2 (free) through LLM.API?

You can call Laguna XS.2 (free) by selecting its model name in the LLM.API completion or chat endpoint and authenticating with your LLM.API key.
What types of tasks is Laguna XS.2 (free) best suited for?

Laguna XS.2 (free) works best for lightweight tasks like drafting text, basic coding help, quick data transformations, and prototyping chat-style assistants.
How does Laguna XS.2 (free) compare to larger Poolside or other premium models?

Laguna XS.2 (free) generally trades off some reasoning depth, coding precision, and factual accuracy for lower cost and faster responses.
What are the main limitations of Laguna XS.2 (free)?

Laguna XS.2 (free) can hallucinate, struggle with complex multi-step reasoning, lack up-to-date knowledge, and is not suitable for high-stakes or compliance-critical applications.
Does Laguna XS.2 (free) support streaming responses on LLM.API?

Yes, Laguna XS.2 (free) can be used with streaming responses if you enable streaming in your LLM.API request parameters.

Start in 2 lines of code

Get My API Key

Laguna XS.2 (free)

What is Laguna XS.2 (free)?

5 Core Capabilities

Conversational Chat

Code Generation

Long-Context Reasoning

Tool And Function Use

Multilingual Text

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent AI Routing

Cost-Aware Orchestration

Resilient Fallback Flows

Full-Stack Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code