LFM2.5-1.2B-Instruct (free)

Instruction Following

LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter, instruction-tuned hybrid language model from LiquidAI, optimized for fast, on-device inference with a ~32k token context window. It offers general-purpose conversational and task-oriented capabilities while running efficiently on edge hardware.

Start Using API

API Performance

Latency: ~0.8s time to first token on consumer GPU/CPU
Context: 32K token context
Input: Free per 1M tokens (open-weight, local inference)
Output: Free per 1M tokens (open-weight, local inference)
Uptime: 99% 99%

About the model

What is LFM2.5-1.2B-Instruct (free)?

LFM2.5-1.2B-Instruct (free) is a compact, instruction-tuned text-generation model from LiquidAI designed for fast, on-device AI with a context window of roughly 32k tokens. It is mainly used for general-purpose chat, agentic workflows, data extraction, and retrieval-augmented generation where low latency and small memory footprint are important. The model is also positioned for multi-language conversational tasks across several major languages, though it is not recommended as a top choice for highly knowledge-intensive or advanced programming workloads. It belongs to the LFM2.5 family of hybrid on-device models, building on the earlier LFM2 architecture with extended pre-training and reinforcement learning-based post-training.

Input / Output

Input

Text prompts (natural language instructions and messages)

Output

Chat-style natural language responses

Model capabilities

5 Core Capabilities

Conversational Chat

Instruction-tuned chat model supporting multi-turn dialogue, general assistance, and natural conversation with strong instruction-following behavior.
Text Generation

Generates coherent, context-aware text for prompts, explanations, and open-ended tasks using a 1.2B-parameter on-device-optimized architecture.
Multilingual Support

Understands and generates text in multiple languages, including English, Arabic, Chinese, and several others, for diverse global use cases.
Tool and Function Use

Supports structured outputs, function calling, and tool use, enabling integration into agentic pipelines and automation workflows.
Edge Deployment

Designed for fast, low-memory inference on CPUs and NPUs, enabling on-device AI experiences on laptops, mobiles, and IoT hardware.

Use cases

6 Most Valuable Use Cases

On-device AI Chat
Mobile Task Assistance
Edge Data Extraction
Lightweight Text Analysis
RAG Answer Generation
CPU-Optimized Inference

Transparent pricing

Cost Comparison

LLM API offers the lowest per-token cost and best performance for LFM2.5-class instruct models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 tps	99.99%	$0.02	$0.02	64K tokens
LiquidAI	Global	~180ms	~40 tps	~99.9%	$0.00	$0.00	~32K tokens
OpenAI (GPT-4o-mini-equivalent)	Global	~220ms	~60 tps	99.9%	~$0.15	~$0.60	128K tokens
Anthropic (Claude 3 Haiku-equivalent)	US East	~250ms	~50 tps	99.9%	~$0.20	~$0.80	200K tokens
Google (Gemini 1.5 Flash-equivalent)	Global	~210ms	~70 tps	99.9%	~$0.12	~$0.48	1M tokens

Performance benchmarks

Technical Specifications

Metric	LFM2.5-1.2B-Instruct (free)	Llama 3.2 1B Instruct	Gemma 2 2B Instruct
Avg Latency	~220ms	~250ms	~260ms
Context Window	16K	16K	8K
Input Price ($/1M)	$0.00	$0.10	$0.09
Output Price ($/1M)	$0.00	$0.15	$0.12
Max Output Tokens	4K	4K	4K
Throughput	~60 tps	~55 tps	~50 tps
Uptime	99.5%	99.9%	99.9%

30-day usage via LLM API

1.8B: Prompt tokens processed (last 30 days)
320M: Completion tokens generated (last 30 days)
4.6M: API requests served (last 30 days)
410K: Unique users (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the best model across providers based on latency, cost, and quality—without touching your app code.
One endpoint, every model
Cost-Aware Optimization

Dynamically pick cheaper equivalent models, control spend with policy-based limits, and monitor per-project usage so you never get surprised by your AI bill.
Cut spend, keep quality
Resilient Fallbacks

Configure automatic failover to backup models and providers when requests fail or time out, keeping your AI features online even during provider outages.
No single point of failure
Deep Observability

Get full visibility into every call—latency, errors, tokens, and model choices—with logs and traces that plug into your existing monitoring stack.
See every token and trace
Task-Level Abstractions

Define high-level tasks—chat, classification, extraction, tools—once and let LLM.API pick and orchestrate the right models and prompts for each job.
Code to tasks, not models
High-Throughput Batch

Process millions of inputs efficiently with optimized batching, concurrency controls, and retry semantics tailored for large-scale offline and backfill workloads.
Scale from 10 to millions

Decision guide

When to Use — When NOT to Use

Use it if...

You need a free, small-footprint instruct model for light-weight experimentation and prototyping.
Your use case involves simple Q&A, definitions, or short factual clarifications on common topics.
You need a compact model suitable for on-device or low-resource server deployments.
Your use case involves generating short emails, messages, or template-based business text.
You need a model to assist with basic code snippets or minor refactoring tasks.
Your use case involves educational examples or demos where cutting-edge capability is unnecessary.
You need a backup or fallback model when larger, paid models are unavailable.

Avoid if...

You need state-of-the-art reasoning, planning, or complex multi-step chain-of-thought solutions.
Your workload requires handling very long documents, transcripts, or multi-document context windows.
You need highly reliable, domain-expert outputs for medical, legal, or financial decisions.
Your workload requires advanced coding assistance across large repositories and complex software architectures.
You need high-quality creative writing, nuanced style control, or sophisticated story generation.
Your workload requires robust tool-use, API orchestration, or complex multi-agent system coordination.
You need strong multilingual performance or translation quality across many low-resource languages.

FAQ

Frequently Asked Questions

What is LFM2.5-1.2B-Instruct (free)?

LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter LiquidAI instruction-tuned language model optimized for fast, low-cost text generation via LLM.API.
What is LFM2.5-1.2B-Instruct (free) best suited for?

It is best for lightweight chatbots, tool-using agents, code helpers, and simple reasoning tasks where low latency and free usage are more important than peak accuracy.
How is LFM2.5-1.2B-Instruct (free) priced on LLM.API?

The model is available in a free tier on LLM.API, meaning requests are not directly metered by tokens but may be subject to fair-use limits.
What is the context window of LFM2.5-1.2B-Instruct (free)?

LFM2.5-1.2B-Instruct (free) supports a context window of up to 8,192 tokens per request on LLM.API.
What modalities does LFM2.5-1.2B-Instruct (free) support?

This model is text-only, accepting text prompts and returning text completions without native image, audio, or video understanding.
How fast is LFM2.5-1.2B-Instruct (free) on LLM.API?

Being a 1.2B-parameter model, it is optimized for low latency and generally responds faster than larger LiquidAI or frontier models under similar conditions.
How do I call LFM2.5-1.2B-Instruct (free) through LLM.API?

Specify the model name "liquidai/lfm2.5-1.2b-instruct-free" (or the documented identifier) in your LLM.API completion or chat endpoint request.
How does LFM2.5-1.2B-Instruct (free) compare to larger LiquidAI or frontier models?

It is cheaper and faster but has weaker long-context reasoning, creativity, and coding depth than larger LiquidAI or state-of-the-art models.
Does LFM2.5-1.2B-Instruct (free) support tools or function calling via LLM.API?

You can use it with LLM.API’s tool-calling layer, but the model itself does not implement a native structured tool-calling protocol.
What are the main limitations of LFM2.5-1.2B-Instruct (free)?

It can hallucinate facts, struggle with complex multi-step reasoning, and may perform poorly on very long documents compared to larger models.

Start in 2 lines of code

Get My API Key

LFM2.5-1.2B-Instruct (free)

What is LFM2.5-1.2B-Instruct (free)?

5 Core Capabilities

Conversational Chat

Text Generation

Multilingual Support

Tool and Function Use

Edge Deployment

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Optimization

Resilient Fallbacks

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code