Powered by LiquidAI
LFM2.5-1.2B-Instruct (free)
- Instruction Following
LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter, instruction-tuned hybrid language model from LiquidAI, optimized for fast, on-device inference with a ~32k token context window. It offers general-purpose conversational and task-oriented capabilities while running efficiently on edge hardware.
About the model
What is LFM2.5-1.2B-Instruct (free)?
LFM2.5-1.2B-Instruct (free) is a compact, instruction-tuned text-generation model from LiquidAI designed for fast, on-device AI with a context window of roughly 32k tokens. It is mainly used for general-purpose chat, agentic workflows, data extraction, and retrieval-augmented generation where low latency and small memory footprint are important. The model is also positioned for multi-language conversational tasks across several major languages, though it is not recommended as a top choice for highly knowledge-intensive or advanced programming workloads. It belongs to the LFM2.5 family of hybrid on-device models, building on the earlier LFM2 architecture with extended pre-training and reinforcement learning-based post-training.
Model capabilities
5 Core Capabilities
-
Conversational Chat
Instruction-tuned chat model supporting multi-turn dialogue, general assistance, and natural conversation with strong instruction-following behavior.
-
Text Generation
Generates coherent, context-aware text for prompts, explanations, and open-ended tasks using a 1.2B-parameter on-device-optimized architecture.
-
Multilingual Support
Understands and generates text in multiple languages, including English, Arabic, Chinese, and several others, for diverse global use cases.
-
Tool and Function Use
Supports structured outputs, function calling, and tool use, enabling integration into agentic pipelines and automation workflows.
-
Edge Deployment
Designed for fast, low-memory inference on CPUs and NPUs, enabling on-device AI experiences on laptops, mobiles, and IoT hardware.
Use cases
6 Most Valuable Use Cases
- On-device AI Chat
- Mobile Task Assistance
- Edge Data Extraction
- Lightweight Text Analysis
- RAG Answer Generation
- CPU-Optimized Inference
Transparent pricing
Cost Comparison
LLM API offers the lowest per-token cost and best performance for LFM2.5-class instruct models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120 tps | 99.99% | $0.02 | $0.02 | 64K tokens |
| LiquidAI | Global | ~180ms | ~40 tps | ~99.9% | $0.00 | $0.00 | ~32K tokens |
| OpenAI (GPT-4o-mini-equivalent) | Global | ~220ms | ~60 tps | 99.9% | ~$0.15 | ~$0.60 | 128K tokens |
| Anthropic (Claude 3 Haiku-equivalent) | US East | ~250ms | ~50 tps | 99.9% | ~$0.20 | ~$0.80 | 200K tokens |
| Google (Gemini 1.5 Flash-equivalent) | Global | ~210ms | ~70 tps | 99.9% | ~$0.12 | ~$0.48 | 1M tokens |
Performance benchmarks
Technical Specifications
| Metric | LFM2.5-1.2B-Instruct (free) | Llama 3.2 1B Instruct | Gemma 2 2B Instruct |
|---|---|---|---|
| Avg Latency | ~220ms | ~250ms | ~260ms |
| Context Window | 16K | 16K | 8K |
| Input Price ($/1M) | $0.00 | $0.10 | $0.09 |
| Output Price ($/1M) | $0.00 | $0.15 | $0.12 |
| Max Output Tokens | 4K | 4K | 4K |
| Throughput | ~60 tps | ~55 tps | ~50 tps |
| Uptime | 99.5% | 99.9% | 99.9% |
30-day usage via LLM API
- 1.8B
- Prompt tokens processed (last 30 days)
- 320M
- Completion tokens generated (last 30 days)
- 4.6M
- API requests served (last 30 days)
- 410K
- Unique users (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the best model across providers based on latency, cost, and quality—without touching your app code.
One endpoint, every model -
Cost-Aware Optimization
Dynamically pick cheaper equivalent models, control spend with policy-based limits, and monitor per-project usage so you never get surprised by your AI bill.
Cut spend, keep quality -
Resilient Fallbacks
Configure automatic failover to backup models and providers when requests fail or time out, keeping your AI features online even during provider outages.
No single point of failure -
Deep Observability
Get full visibility into every call—latency, errors, tokens, and model choices—with logs and traces that plug into your existing monitoring stack.
See every token and trace -
Task-Level Abstractions
Define high-level tasks—chat, classification, extraction, tools—once and let LLM.API pick and orchestrate the right models and prompts for each job.
Code to tasks, not models -
High-Throughput Batch
Process millions of inputs efficiently with optimized batching, concurrency controls, and retry semantics tailored for large-scale offline and backfill workloads.
Scale from 10 to millions
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a free, small-footprint instruct model for light-weight experimentation and prototyping.
- Your use case involves simple Q&A, definitions, or short factual clarifications on common topics.
- You need a compact model suitable for on-device or low-resource server deployments.
- Your use case involves generating short emails, messages, or template-based business text.
- You need a model to assist with basic code snippets or minor refactoring tasks.
- Your use case involves educational examples or demos where cutting-edge capability is unnecessary.
- You need a backup or fallback model when larger, paid models are unavailable.
Avoid if...
- You need state-of-the-art reasoning, planning, or complex multi-step chain-of-thought solutions.
- Your workload requires handling very long documents, transcripts, or multi-document context windows.
- You need highly reliable, domain-expert outputs for medical, legal, or financial decisions.
- Your workload requires advanced coding assistance across large repositories and complex software architectures.
- You need high-quality creative writing, nuanced style control, or sophisticated story generation.
- Your workload requires robust tool-use, API orchestration, or complex multi-agent system coordination.
- You need strong multilingual performance or translation quality across many low-resource languages.
FAQ
Frequently Asked Questions
-
What is LFM2.5-1.2B-Instruct (free)?
LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter LiquidAI instruction-tuned language model optimized for fast, low-cost text generation via LLM.API.
-
What is LFM2.5-1.2B-Instruct (free) best suited for?
It is best for lightweight chatbots, tool-using agents, code helpers, and simple reasoning tasks where low latency and free usage are more important than peak accuracy.
-
How is LFM2.5-1.2B-Instruct (free) priced on LLM.API?
The model is available in a free tier on LLM.API, meaning requests are not directly metered by tokens but may be subject to fair-use limits.
-
What is the context window of LFM2.5-1.2B-Instruct (free)?
LFM2.5-1.2B-Instruct (free) supports a context window of up to 8,192 tokens per request on LLM.API.
-
What modalities does LFM2.5-1.2B-Instruct (free) support?
This model is text-only, accepting text prompts and returning text completions without native image, audio, or video understanding.
-
How fast is LFM2.5-1.2B-Instruct (free) on LLM.API?
Being a 1.2B-parameter model, it is optimized for low latency and generally responds faster than larger LiquidAI or frontier models under similar conditions.
-
How do I call LFM2.5-1.2B-Instruct (free) through LLM.API?
Specify the model name "liquidai/lfm2.5-1.2b-instruct-free" (or the documented identifier) in your LLM.API completion or chat endpoint request.
-
How does LFM2.5-1.2B-Instruct (free) compare to larger LiquidAI or frontier models?
It is cheaper and faster but has weaker long-context reasoning, creativity, and coding depth than larger LiquidAI or state-of-the-art models.
-
Does LFM2.5-1.2B-Instruct (free) support tools or function calling via LLM.API?
You can use it with LLM.API’s tool-calling layer, but the model itself does not implement a native structured tool-calling protocol.
-
What are the main limitations of LFM2.5-1.2B-Instruct (free)?
It can hallucinate facts, struggle with complex multi-step reasoning, and may perform poorly on very long documents compared to larger models.
