Powered by Poolside
Laguna XS.2 (free)
- Text Generation
Laguna XS.2 (free) by Poolside is a compact, open‑weight agentic coding model optimized for fast, affordable software engineering workflows, available at no cost via selected providers. It combines a Mixture‑of‑Experts architecture with strong coding performance while remaining lightweight enough to run in more constrained environments.
About the model
What is Laguna XS.2 (free)?
Laguna XS.2 is Poolside’s open‑weight, 33B‑parameter Mixture‑of‑Experts language model with about 3B active parameters, designed primarily for agentic coding and software engineering tasks. It is used for building coding agents that can iteratively edit code, run tools, and solve multi‑step programming tasks, and for running private or on‑premise development assistants thanks to its relatively low resource requirements. It also supports general chat-style interactions for developers through APIs and integrations such as OpenRouter and third‑party platforms offering a free preview tier. Laguna XS.2 is part of Poolside’s Laguna family of models and is a second‑generation, XS‑class successor building on the training pipeline and lessons from the larger Laguna M.1 model.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Supports instruction-following, multi-turn chat, and reasoning-focused assistant interactions over long contexts using a text-to-text chat interface.
-
Code Generation
Optimized for software engineering and agentic coding, generating and editing code, fixing bugs, and handling multi-step programming tasks.
-
Long-Context Reasoning
Handles up to 262k-token contexts with sliding window and global attention, enabling long-horizon reasoning and document-spanning workflows.
-
Tool And Function Use
Natively supports tool use and function calling, reasoning before and between tool calls for automated workflows and coding agents.
-
Multilingual Text
Processes and generates text in multiple languages, enabling cross-lingual chat, documentation assistance, and programming help with non-English content.
Use cases
6 Most Valuable Use Cases
- Local Coding Agents
- Automated Bug Fixing
- Codebase Refactoring
- Tool-Assisted Debugging
- Software Dev Assistants
- Multi-Step Code Reasoning
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and latency vs comparable Laguna-class APIs.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 80 tps | 99.99% | $0.05 | $0.10 | 128K |
| Poolside | Global | ~220ms | ~35 tps | ~99.9% | $0.00 | $0.00 | ~64K |
| OpenAI | US East | ~250ms | ~40 tps | 99.9% | ~$0.30 | ~$0.60 | ~128K |
| Anthropic | US West | ~260ms | ~30 tps | 99.9% | ~$0.35 | ~$0.80 | ~200K |
| Google Cloud | EU West | ~280ms | ~25 tps | 99.9% | ~$0.40 | ~$0.80 | ~128K |
Performance benchmarks
Technical Specifications
| Metric | Laguna XS.2 (free) | GPT-4o mini (OpenAI) | Claude 3.5 Haiku (Anthropic) |
|---|---|---|---|
| Avg Latency | ~250ms | ~300ms | ~350ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.00 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.00 | $0.60 | $1.25 |
| Max Output Tokens | 4K | 4K | 4K |
| Throughput | ~120 tps | ~100 tps | ~90 tps |
| Uptime | 99.0% | 99.9% | 99.9% |
30-day usage via LLM API
- 12.5B
- Prompt tokens processed (last 30 days)
- 7.8B
- Completion tokens generated (last 30 days)
- 9.3M
- API requests served (last 30 days)
- 410K
- Unique users (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent AI Routing
Automatically route each request to the optimal model across providers based on latency, capability, and cost—without changing your integration.
One endpoint, any model -
Cost-Aware Orchestration
Blend premium and budget models with per-request cost controls, guardrails, and policies so teams can ship richer AI features without runaway spend.
More AI, less spend -
Resilient Fallback Flows
Define automatic cross-provider fallbacks, retries, and degradations so production workloads keep working even when individual models or regions fail.
Designed for failure -
Full-Stack Observability
Trace every call across providers with logs, metrics, and structured spans to debug prompts, tune routing, and meet compliance requirements.
See every token -
Task-Level Abstractions
Describe tasks like chat, tools, RAG, or classification once and let LLM.API handle provider-specific prompts, parameters, and response shaping.
Tasks, not prompts -
High-Throughput Batch
Submit large batches through a unified API with provider-aware chunking, concurrency control, and retries to slash latency and infrastructure overhead.
Scale to millions
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a free model from Poolside for experimentation or early prototyping.
- Your use case involves simple Q&A, proofreading, or light content rewriting tasks.
- You need a backup or fallback model to reduce overall API spending.
- Your use case involves building internal tools where occasional inaccuracies are acceptable.
- You need to test Poolside platform integration before committing to paid tiers.
- Your use case involves short-form text generation like summaries, captions, or brief replies.
Avoid if...
- You need guaranteed top-tier reasoning performance comparable to the latest frontier models.
- Your workload requires highly reliable code generation, debugging, and complex software design support.
- You need enterprise-grade SLAs, dedicated support, or strict performance guarantees for production systems.
- Your workload requires specialized capabilities like vision, audio, tools, or very long context windows.
- You need state-of-the-art performance on complex math, formal logic, or multi-step planning.
- Your workload requires tight latency guarantees for real-time or user-facing critical interactions.
FAQ
Frequently Asked Questions
-
What is Laguna XS.2 (free)?
Laguna XS.2 (free) is a lightweight Poolside language model accessible via LLM.API, intended for general-purpose text generation and experimentation at no charge.
-
What modalities does Laguna XS.2 (free) support?
Laguna XS.2 (free) supports text-only input and output, without native image, audio, or video understanding capabilities.
-
How is Laguna XS.2 (free) priced on LLM.API?
Laguna XS.2 (free) is offered with no per-token usage fees on LLM.API, subject to platform-specific free-tier and rate limits.
-
What is the context window of Laguna XS.2 (free)?
Laguna XS.2 (free) supports a 16K token context window for combined input and output on LLM.API.
-
How fast is Laguna XS.2 (free) in terms of latency and throughput?
Laguna XS.2 (free) is optimized for low-latency responses and higher throughput than larger models, though exact speeds depend on LLM.API load and client location.
-
How do I call Laguna XS.2 (free) through LLM.API?
You can call Laguna XS.2 (free) by selecting its model name in the LLM.API completion or chat endpoint and authenticating with your LLM.API key.
-
What types of tasks is Laguna XS.2 (free) best suited for?
Laguna XS.2 (free) works best for lightweight tasks like drafting text, basic coding help, quick data transformations, and prototyping chat-style assistants.
-
How does Laguna XS.2 (free) compare to larger Poolside or other premium models?
Laguna XS.2 (free) generally trades off some reasoning depth, coding precision, and factual accuracy for lower cost and faster responses.
-
What are the main limitations of Laguna XS.2 (free)?
Laguna XS.2 (free) can hallucinate, struggle with complex multi-step reasoning, lack up-to-date knowledge, and is not suitable for high-stakes or compliance-critical applications.
-
Does Laguna XS.2 (free) support streaming responses on LLM.API?
Yes, Laguna XS.2 (free) can be used with streaming responses if you enable streaming in your LLM.API request parameters.
