Powered by LiquidAI

LFM2-24B-A2B

  • Text Generation

LFM2-24B-A2B is LiquidAI’s largest LFM2-series hybrid Mixture-of-Experts language model, designed to deliver high-quality text generation while remaining efficient enough to run on consumer hardware.

Start Using API

What is LFM2-24B-A2B?

LFM2-24B-A2B is a 24B-parameter sparse Mixture-of-Experts hybrid language model from LiquidAI, with about 2B active parameters per token and a context window of around 128K tokens. It is primarily used for general-purpose text generation tasks such as drafting, summarization, and chat-style assistance, with a focus on low-cost inference. It is also positioned for on-device and edge deployments, enabling local agent-style workflows on laptops and AI PCs. It belongs to the LFM2 family of models, extending the series from smaller variants (e.g., LFM2-350M and mid-sized LFM2 models) up to this largest 24B configuration.

5 Core Capabilities

  • Conversational Chat

    Engages in multi-turn dialogue, answering questions, following instructions, and adapting responses to user context and intent.

  • Image Interpretation

    Analyzes images to identify objects, scenes, and relationships, enabling visual question answering and descriptive explanations.

  • Text Translation

    Translates written content between multiple languages while preserving meaning, tone, and stylistic nuance as closely as possible.

  • Document OCR

    Extracts machine-readable text from documents and images, enabling downstream search, summarization, and content analysis workflows.

  • System Monitoring

    Supports monitoring-style tasks such as interpreting logs, alerts, and metrics to assist with diagnostics and incident summaries.

6 Most Valuable Use Cases

  • On-device Chat Assistant
  • Local Document Summarization
  • Privacy-first Case Notes
  • System Log Monitoring
  • Edge Productivity Copilot
  • CPU-only Text Generation

Cost Comparison

LLM API offers the lowest cost and highest performance for LFM2-24B-A2B-class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120 tps 99.99% $0.40 $0.80 256K
LiquidAI US East ~140ms ~70 tps ~99.9% ~$0.65 ~$1.30 ~128K
OpenAI (comparable 20–30B model) Global ~200ms ~60 tps ~99.9% ~$1.00 ~$2.00 ~128K
Anthropic (comparable 20–30B model) US West ~190ms ~55 tps ~99.9% ~$1.10 ~$2.20 ~200K
Azure AI (LiquidAI-compatible deployment) EU West ~210ms ~50 tps ~99.95% ~$0.90 ~$1.80 ~128K

Technical Specifications

Metric LFM2-24B-A2B (LiquidAI) GPT-4.1-mini (OpenAI) Claude 3.5 Haiku (Anthropic)
Avg Latency ~180ms ~220ms ~250ms
Context Window 128K 128K 200K
Input Price ($/1M) $0.10 $0.15 $0.25
Output Price ($/1M) $0.40 $0.60 $0.80
Max Output Tokens 8K 4K 8K
Throughput 120 tps 100 tps 90 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

11.4B
Prompt tokens processed (last 30 days)
7.8B
Completion tokens generated (last 30 days)
4.6M
API requests served (last 30 days)
99.6%
Average uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Smart Model Routing

    Dynamically route each request across providers by latency, price, and quality. One endpoint abstracts vendor lock-in and keeps workloads on the best option automatically.

    One endpoint, every model
  • Cost-Aware Orchestration

    Automatically balance quality and spend with per-request cost controls, usage caps, and cheaper alternates. Ship rich AI features without blowing your infrastructure budget.

    Optimize cost per token
  • Resilient Fallback Flows

    Define provider and model fallbacks that trigger instantly on timeouts, rate limits, or errors. Keep user-facing experiences stable even when vendors fail.

    Failures auto-rerouted
  • End-to-End Observability

    Trace every request across models and providers with logs, metrics, and latency breakdowns. Debug production issues fast and tune routing using real traffic data.

    See every token hop
  • Task-Level Abstractions

    Describe tasks—not models—and let LLM.API pick the right tools, prompts, and providers. Standardize patterns like chat, tools, and RAG behind one API.

    Program tasks, not models
  • High-Throughput Batching

    Send large batches of requests in a single call with concurrency controls and retry policies. Maximize throughput and minimize overhead for heavy workloads.

    Scale up without thrash

When to Use — When NOT to Use

Use it if...

  • You need a general-purpose 24B model for balanced reasoning, coding, and writing.
  • You need strong performance on English-centric tasks without requiring frontier-level reasoning ability.
  • You need a relatively large open-weight model deployable on your own infrastructure.
  • Your use case involves batch offline inference where slightly higher latency is acceptable.
  • Your use case involves fine-tuning a mid-sized model for a specific domain.
  • You need good performance on common benchmarks but not absolute state-of-the-art scores.
  • Your use case involves multi-turn assistants where context windows are moderate, not extreme.

Avoid if...

  • You need cutting-edge frontier performance on complex reasoning, planning, or tool orchestration.
  • Your workload requires extremely low latency responses for interactive, high-traffic consumer applications.
  • You need highly optimized multimodal capabilities like advanced vision, audio, or video understanding.
  • Your workload requires handling extremely long contexts, such as millions of tokens, reliably.
  • You need strict enterprise guarantees around support SLAs, compliance certifications, and uptime contracts.
  • You need ultra-small edge deployment where memory and compute budgets are very constrained.
  • Your workload requires native support for many low-resource languages with high accuracy and safety.

Frequently Asked Questions

  • What is LFM2-24B-A2B?

    LFM2-24B-A2B is a 24B-parameter LiquidAI language model available through LLM.API, designed for high-quality text generation and reasoning tasks.

  • What is LFM2-24B-A2B best suited for?

    LFM2-24B-A2B is best for complex code generation, multi-step reasoning, data transformation, and longer-form content where quality matters more than minimal latency.

  • What modalities does LFM2-24B-A2B support?

    LFM2-24B-A2B is a text-only model that accepts text prompts and returns text completions.

  • What context window does LFM2-24B-A2B support on LLM.API?

    LFM2-24B-A2B supports up to a 32K-token context window via LLM.API, including input and output tokens combined.

  • How does LFM2-24B-A2B compare to similar 20–30B parameter models?

    LFM2-24B-A2B targets stronger reasoning and coding quality than typical 7–14B models, with higher cost but better performance on complex tasks.

  • How fast is LFM2-24B-A2B in terms of latency and throughput?

    LFM2-24B-A2B has moderate first-token latency typical of 20–30B models, but streams tokens quickly enough for interactive applications.

  • How is LFM2-24B-A2B priced on LLM.API?

    LFM2-24B-A2B uses a per-token pricing model on LLM.API, with separate input and output token rates defined in the LLM.API pricing page.

  • How do I call LFM2-24B-A2B through the LLM.API gateway?

    Specify the model ID "LFM2-24B-A2B" in your LLM.API completion or chat endpoint request, along with your API key and usual parameters.

  • Does LFM2-24B-A2B support function calling or structured tool outputs?

    LFM2-24B-A2B can be prompted to emit structured JSON, but native function-calling semantics depend on LLM.API’s tooling layer, not the model itself.

  • What are the main limitations of LFM2-24B-A2B?

    LFM2-24B-A2B can hallucinate facts, lacks real-time knowledge, and may struggle with highly specialized domain data without careful prompting or retrieval.

Start in 2 lines of code

Get My API Key