Powered by LiquidAI
LFM2-24B-A2B
- Text Generation
LFM2-24B-A2B is LiquidAI’s largest LFM2-series hybrid Mixture-of-Experts language model, designed to deliver high-quality text generation while remaining efficient enough to run on consumer hardware.
About the model
What is LFM2-24B-A2B?
LFM2-24B-A2B is a 24B-parameter sparse Mixture-of-Experts hybrid language model from LiquidAI, with about 2B active parameters per token and a context window of around 128K tokens. It is primarily used for general-purpose text generation tasks such as drafting, summarization, and chat-style assistance, with a focus on low-cost inference. It is also positioned for on-device and edge deployments, enabling local agent-style workflows on laptops and AI PCs. It belongs to the LFM2 family of models, extending the series from smaller variants (e.g., LFM2-350M and mid-sized LFM2 models) up to this largest 24B configuration.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Engages in multi-turn dialogue, answering questions, following instructions, and adapting responses to user context and intent.
-
Image Interpretation
Analyzes images to identify objects, scenes, and relationships, enabling visual question answering and descriptive explanations.
-
Text Translation
Translates written content between multiple languages while preserving meaning, tone, and stylistic nuance as closely as possible.
-
Document OCR
Extracts machine-readable text from documents and images, enabling downstream search, summarization, and content analysis workflows.
-
System Monitoring
Supports monitoring-style tasks such as interpreting logs, alerts, and metrics to assist with diagnostics and incident summaries.
Use cases
6 Most Valuable Use Cases
- On-device Chat Assistant
- Local Document Summarization
- Privacy-first Case Notes
- System Log Monitoring
- Edge Productivity Copilot
- CPU-only Text Generation
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance for LFM2-24B-A2B-class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120 tps | 99.99% | $0.40 | $0.80 | 256K |
| LiquidAI | US East | ~140ms | ~70 tps | ~99.9% | ~$0.65 | ~$1.30 | ~128K |
| OpenAI (comparable 20–30B model) | Global | ~200ms | ~60 tps | ~99.9% | ~$1.00 | ~$2.00 | ~128K |
| Anthropic (comparable 20–30B model) | US West | ~190ms | ~55 tps | ~99.9% | ~$1.10 | ~$2.20 | ~200K |
| Azure AI (LiquidAI-compatible deployment) | EU West | ~210ms | ~50 tps | ~99.95% | ~$0.90 | ~$1.80 | ~128K |
Performance benchmarks
Technical Specifications
| Metric | LFM2-24B-A2B (LiquidAI) | GPT-4.1-mini (OpenAI) | Claude 3.5 Haiku (Anthropic) |
|---|---|---|---|
| Avg Latency | ~180ms | ~220ms | ~250ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.10 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.40 | $0.60 | $0.80 |
| Max Output Tokens | 8K | 4K | 8K |
| Throughput | 120 tps | 100 tps | 90 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 11.4B
- Prompt tokens processed (last 30 days)
- 7.8B
- Completion tokens generated (last 30 days)
- 4.6M
- API requests served (last 30 days)
- 99.6%
- Average uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Smart Model Routing
Dynamically route each request across providers by latency, price, and quality. One endpoint abstracts vendor lock-in and keeps workloads on the best option automatically.
One endpoint, every model -
Cost-Aware Orchestration
Automatically balance quality and spend with per-request cost controls, usage caps, and cheaper alternates. Ship rich AI features without blowing your infrastructure budget.
Optimize cost per token -
Resilient Fallback Flows
Define provider and model fallbacks that trigger instantly on timeouts, rate limits, or errors. Keep user-facing experiences stable even when vendors fail.
Failures auto-rerouted -
End-to-End Observability
Trace every request across models and providers with logs, metrics, and latency breakdowns. Debug production issues fast and tune routing using real traffic data.
See every token hop -
Task-Level Abstractions
Describe tasks—not models—and let LLM.API pick the right tools, prompts, and providers. Standardize patterns like chat, tools, and RAG behind one API.
Program tasks, not models -
High-Throughput Batching
Send large batches of requests in a single call with concurrency controls and retry policies. Maximize throughput and minimize overhead for heavy workloads.
Scale up without thrash
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a general-purpose 24B model for balanced reasoning, coding, and writing.
- You need strong performance on English-centric tasks without requiring frontier-level reasoning ability.
- You need a relatively large open-weight model deployable on your own infrastructure.
- Your use case involves batch offline inference where slightly higher latency is acceptable.
- Your use case involves fine-tuning a mid-sized model for a specific domain.
- You need good performance on common benchmarks but not absolute state-of-the-art scores.
- Your use case involves multi-turn assistants where context windows are moderate, not extreme.
Avoid if...
- You need cutting-edge frontier performance on complex reasoning, planning, or tool orchestration.
- Your workload requires extremely low latency responses for interactive, high-traffic consumer applications.
- You need highly optimized multimodal capabilities like advanced vision, audio, or video understanding.
- Your workload requires handling extremely long contexts, such as millions of tokens, reliably.
- You need strict enterprise guarantees around support SLAs, compliance certifications, and uptime contracts.
- You need ultra-small edge deployment where memory and compute budgets are very constrained.
- Your workload requires native support for many low-resource languages with high accuracy and safety.
FAQ
Frequently Asked Questions
-
What is LFM2-24B-A2B?
LFM2-24B-A2B is a 24B-parameter LiquidAI language model available through LLM.API, designed for high-quality text generation and reasoning tasks.
-
What is LFM2-24B-A2B best suited for?
LFM2-24B-A2B is best for complex code generation, multi-step reasoning, data transformation, and longer-form content where quality matters more than minimal latency.
-
What modalities does LFM2-24B-A2B support?
LFM2-24B-A2B is a text-only model that accepts text prompts and returns text completions.
-
What context window does LFM2-24B-A2B support on LLM.API?
LFM2-24B-A2B supports up to a 32K-token context window via LLM.API, including input and output tokens combined.
-
How does LFM2-24B-A2B compare to similar 20–30B parameter models?
LFM2-24B-A2B targets stronger reasoning and coding quality than typical 7–14B models, with higher cost but better performance on complex tasks.
-
How fast is LFM2-24B-A2B in terms of latency and throughput?
LFM2-24B-A2B has moderate first-token latency typical of 20–30B models, but streams tokens quickly enough for interactive applications.
-
How is LFM2-24B-A2B priced on LLM.API?
LFM2-24B-A2B uses a per-token pricing model on LLM.API, with separate input and output token rates defined in the LLM.API pricing page.
-
How do I call LFM2-24B-A2B through the LLM.API gateway?
Specify the model ID "LFM2-24B-A2B" in your LLM.API completion or chat endpoint request, along with your API key and usual parameters.
-
Does LFM2-24B-A2B support function calling or structured tool outputs?
LFM2-24B-A2B can be prompted to emit structured JSON, but native function-calling semantics depend on LLM.API’s tooling layer, not the model itself.
-
What are the main limitations of LFM2-24B-A2B?
LFM2-24B-A2B can hallucinate facts, lacks real-time knowledge, and may struggle with highly specialized domain data without careful prompting or retrieval.
