LFM2-24B-A2B

Text Generation

LFM2-24B-A2B is LiquidAI’s largest LFM2-series hybrid Mixture-of-Experts language model, designed to deliver high-quality text generation while remaining efficient enough to run on consumer hardware.

Start Using API

API Performance

Latency: ~0.7s time to first token
Context: ~32K token context
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is LFM2-24B-A2B?

LFM2-24B-A2B is a 24B-parameter sparse Mixture-of-Experts hybrid language model from LiquidAI, with about 2B active parameters per token and a context window of around 128K tokens. It is primarily used for general-purpose text generation tasks such as drafting, summarization, and chat-style assistance, with a focus on low-cost inference. It is also positioned for on-device and edge deployments, enabling local agent-style workflows on laptops and AI PCs. It belongs to the LFM2 family of models, extending the series from smaller variants (e.g., LFM2-350M and mid-sized LFM2 models) up to this largest 24B configuration.

Input / Output

Input

Text prompts (natural language, code, instructions)

Output

Generated text responses (chat-style completions)
Generated source code snippets and programming outputs

Model capabilities

5 Core Capabilities

Conversational Chat

Engages in multi-turn dialogue, answering questions, following instructions, and adapting responses to user context and intent.
Image Interpretation

Analyzes images to identify objects, scenes, and relationships, enabling visual question answering and descriptive explanations.
Text Translation

Translates written content between multiple languages while preserving meaning, tone, and stylistic nuance as closely as possible.
Document OCR

Extracts machine-readable text from documents and images, enabling downstream search, summarization, and content analysis workflows.
System Monitoring

Supports monitoring-style tasks such as interpreting logs, alerts, and metrics to assist with diagnostics and incident summaries.

Use cases

6 Most Valuable Use Cases

On-device Chat Assistant
Local Document Summarization
Privacy-first Case Notes
System Log Monitoring
Edge Productivity Copilot
CPU-only Text Generation

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance for LFM2-24B-A2B-class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 tps	99.99%	$0.40	$0.80	256K
LiquidAI	US East	~140ms	~70 tps	~99.9%	~$0.65	~$1.30	~128K
OpenAI (comparable 20–30B model)	Global	~200ms	~60 tps	~99.9%	~$1.00	~$2.00	~128K
Anthropic (comparable 20–30B model)	US West	~190ms	~55 tps	~99.9%	~$1.10	~$2.20	~200K
Azure AI (LiquidAI-compatible deployment)	EU West	~210ms	~50 tps	~99.95%	~$0.90	~$1.80	~128K

Performance benchmarks

Technical Specifications

Metric	LFM2-24B-A2B (LiquidAI)	GPT-4.1-mini (OpenAI)	Claude 3.5 Haiku (Anthropic)
Avg Latency	~180ms	~220ms	~250ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.10	$0.15	$0.25
Output Price ($/1M)	$0.40	$0.60	$0.80
Max Output Tokens	8K	4K	8K
Throughput	120 tps	100 tps	90 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

11.4B: Prompt tokens processed (last 30 days)
7.8B: Completion tokens generated (last 30 days)
4.6M: API requests served (last 30 days)
99.6%: Average uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Smart Model Routing

Dynamically route each request across providers by latency, price, and quality. One endpoint abstracts vendor lock-in and keeps workloads on the best option automatically.
One endpoint, every model
Cost-Aware Orchestration

Automatically balance quality and spend with per-request cost controls, usage caps, and cheaper alternates. Ship rich AI features without blowing your infrastructure budget.
Optimize cost per token
Resilient Fallback Flows

Define provider and model fallbacks that trigger instantly on timeouts, rate limits, or errors. Keep user-facing experiences stable even when vendors fail.
Failures auto-rerouted
End-to-End Observability

Trace every request across models and providers with logs, metrics, and latency breakdowns. Debug production issues fast and tune routing using real traffic data.
See every token hop
Task-Level Abstractions

Describe tasks—not models—and let LLM.API pick the right tools, prompts, and providers. Standardize patterns like chat, tools, and RAG behind one API.
Program tasks, not models
High-Throughput Batching

Send large batches of requests in a single call with concurrency controls and retry policies. Maximize throughput and minimize overhead for heavy workloads.
Scale up without thrash

Decision guide

When to Use — When NOT to Use

Use it if...

You need a general-purpose 24B model for balanced reasoning, coding, and writing.
You need strong performance on English-centric tasks without requiring frontier-level reasoning ability.
You need a relatively large open-weight model deployable on your own infrastructure.
Your use case involves batch offline inference where slightly higher latency is acceptable.
Your use case involves fine-tuning a mid-sized model for a specific domain.
You need good performance on common benchmarks but not absolute state-of-the-art scores.
Your use case involves multi-turn assistants where context windows are moderate, not extreme.

Avoid if...

You need cutting-edge frontier performance on complex reasoning, planning, or tool orchestration.
Your workload requires extremely low latency responses for interactive, high-traffic consumer applications.
You need highly optimized multimodal capabilities like advanced vision, audio, or video understanding.
Your workload requires handling extremely long contexts, such as millions of tokens, reliably.
You need strict enterprise guarantees around support SLAs, compliance certifications, and uptime contracts.
You need ultra-small edge deployment where memory and compute budgets are very constrained.
Your workload requires native support for many low-resource languages with high accuracy and safety.

FAQ

Frequently Asked Questions

What is LFM2-24B-A2B?

LFM2-24B-A2B is a 24B-parameter LiquidAI language model available through LLM.API, designed for high-quality text generation and reasoning tasks.
What is LFM2-24B-A2B best suited for?

LFM2-24B-A2B is best for complex code generation, multi-step reasoning, data transformation, and longer-form content where quality matters more than minimal latency.
What modalities does LFM2-24B-A2B support?

LFM2-24B-A2B is a text-only model that accepts text prompts and returns text completions.
What context window does LFM2-24B-A2B support on LLM.API?

LFM2-24B-A2B supports up to a 32K-token context window via LLM.API, including input and output tokens combined.
How does LFM2-24B-A2B compare to similar 20–30B parameter models?

LFM2-24B-A2B targets stronger reasoning and coding quality than typical 7–14B models, with higher cost but better performance on complex tasks.
How fast is LFM2-24B-A2B in terms of latency and throughput?

LFM2-24B-A2B has moderate first-token latency typical of 20–30B models, but streams tokens quickly enough for interactive applications.
How is LFM2-24B-A2B priced on LLM.API?

LFM2-24B-A2B uses a per-token pricing model on LLM.API, with separate input and output token rates defined in the LLM.API pricing page.
How do I call LFM2-24B-A2B through the LLM.API gateway?

Specify the model ID "LFM2-24B-A2B" in your LLM.API completion or chat endpoint request, along with your API key and usual parameters.
Does LFM2-24B-A2B support function calling or structured tool outputs?

LFM2-24B-A2B can be prompted to emit structured JSON, but native function-calling semantics depend on LLM.API’s tooling layer, not the model itself.
What are the main limitations of LFM2-24B-A2B?

LFM2-24B-A2B can hallucinate facts, lacks real-time knowledge, and may struggle with highly specialized domain data without careful prompting or retrieval.

Start in 2 lines of code

Get My API Key

LFM2-24B-A2B

What is LFM2-24B-A2B?

5 Core Capabilities

Conversational Chat

Image Interpretation

Text Translation

Document OCR

System Monitoring

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Smart Model Routing

Cost-Aware Orchestration

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batching

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code