LFM2.5-1.2B-Thinking (free)

Text Generation

LFM2.5-1.2B-Thinking (free) is LiquidAI’s 1.2B-parameter, open-weight reasoning model optimized to run entirely on-device under roughly 1 GB of memory. It focuses on chain-of-thought style “thinking” before answers, bringing lightweight yet capable reasoning to phones, laptops, and edge hardware.

Start Using API

API Performance

Latency: ~0.4s avg time to first token on mobile/edge
Context: 32K token context
Input: Free per 1M tokens (open-weight, free-tier via providers)
Output: Free per 1M tokens (open-weight, free-tier via providers)
Uptime: 99% 99%

About the model

What is LFM2.5-1.2B-Thinking (free)?

LFM2.5-1.2B-Thinking is a 1.2 billion parameter open‑weight reasoning model from LiquidAI designed to run fully on local devices with under 1 GB of memory. It is mainly used for on-device conversational assistants, lightweight analysis, and instructional tasks that benefit from explicit intermediate reasoning traces while remaining offline-capable. It is also used in edge and embedded scenarios—such as mobile apps, IoT, and in-browser WebGPU demos—where fast, low-cost chain-of-thought reasoning is needed without relying on cloud compute. The model belongs to the LFM2.5 family of hybrid on-device foundation models, which extends the earlier LFM2 architecture with additional pretraining and reinforcement learning for improved reasoning quality.

Input / Output

Input

Text prompts (chat-style or plain text)

Output

Natural language responses (generated text)
Source code snippets in text form

Model capabilities

5 Core Capabilities

On-device Reasoning

Performs chain-of-thought style reasoning entirely on local hardware, generating intermediate thoughts before providing final answers.
Multilingual Chat

Supports conversational text generation across multiple languages, enabling interactive dialogue and lightweight assistant-style behaviors on edge devices.
Low-latency Inference

Optimized for fast token generation under 1GB memory, suitable for real-time use on phones, laptops, and embedded systems.
Text-only Processing

Handles purely textual inputs and outputs, focusing on language understanding and generation without built-in vision or tool use.
Multilingual Understanding

Understands and generates text in several languages, including English, Chinese, Arabic, and others, for globally distributed applications.

Use cases

6 Most Valuable Use Cases

On-device Chat Assistant
Lightweight Text Generation
Edge Reasoning Demos
Mobile AI Prototyping
Cost-free API Experiments
Local Inference Benchmarking

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and highest performance access to LFM2.5-class reasoning models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~120 tps	~99.99%	$0.00	$0.00	~128K tokens
LiquidAI	Global	~180ms	~80 tps	~99.9%	$0.00	$0.00	~64K tokens
OpenRouter	Global	~220ms	~60 tps	~99.9%	~$0.20 per 1M tokens	~$0.40 per 1M tokens	~64K tokens
Together AI	US East	~210ms	~75 tps	~99.9%	~$0.18 per 1M tokens	~$0.36 per 1M tokens	~128K tokens
DeepInfra	EU West	~230ms	~55 tps	~99.5%	~$0.22 per 1M tokens	~$0.44 per 1M tokens	~32K tokens

Performance benchmarks

Technical Specifications

Metric	LFM2.5-1.2B-Thinking (free)	GPT-4o mini	Claude 3 Haiku
Avg Latency	~220ms	~250ms	~300ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.00	$0.15	$0.25
Output Price ($/1M)	$0.00	$0.60	$1.25
Max Output Tokens	4K	8K	8K
Throughput	~45 tps	~40 tps	~35 tps
Uptime	99.0%	99.9%	99.9%

30-day usage via LLM API

7.4B: Prompt tokens processed (last 30 days)
11.8M: API requests served (last 30 days)
9.1B: Completion tokens generated (last 30 days)
680K: Unique users on free tier (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Automatically route each request to the optimal model and provider based on cost, latency, and quality—so you ship faster without hard-coding vendor logic.
One API, any model
Cost-Aware Orchestration

Mix premium and budget models with configurable policies, caps, and guardrails—minimizing spend while preserving SLA and output quality at scale.
Control spend by design
Resilient Fallback Flows

Define multi-provider, multi-model fallback chains that automatically retry or downgrade when vendors fail—no more outages or 500s from upstream instability.
Reliability by default
Deep LLM Observability

Trace every call across providers with logs, metrics, and structured events—making it easy to debug prompts, tune routing, and prove performance to stakeholders.
See every token
Task-Level Abstractions

Describe tasks like chat, classify, extract, or embed once, and let LLM.API pick and adapt models—so your app logic stays provider-agnostic.
Code to tasks, not models
High-Throughput Batch

Process millions of inputs via optimized batch pipelines with concurrency controls, backoff, and deduplication—maximizing throughput while respecting provider limits.
Scale workloads effortlessly

Decision guide

When to Use — When NOT to Use

Use it if...

You need a free reasoning-focused model for prototyping thought-intensive prompts and workflows.
Your use case involves experimenting with chain-of-thought style prompting on small tasks.
You need lightweight analytical assistance for short texts, like summaries or quick classifications.
Your use case involves educational demos of reasoning models without incurring usage costs.
You need a compact model suitable for low-traffic tools, bots, or assistants.
Your use case involves batch-running modest reasoning jobs where throughput matters more than perfection.
You need a reasoning helper embedded in internal tools where occasional mistakes are acceptable.

Avoid if...

You need frontier-level reasoning quality on complex, multi-hop tasks across large documents.
Your workload requires handling very long contexts, such as entire books or codebases.
You need strong performance on nuanced enterprise tasks like legal, medical, or financial analysis.
Your workload requires state-of-the-art coding assistance, debugging, or large-application refactoring.
You need highly reliable outputs for safety-critical systems with strict accuracy guarantees.
Your workload requires strong tool-use orchestration across many APIs and complex workflows.
You need enterprise-grade reliability, SLAs, and vendor support for mission-critical systems.

FAQ

Frequently Asked Questions

What is LFM2.5-1.2B-Thinking (free)?

LFM2.5-1.2B-Thinking (free) is a 1.2B-parameter LiquidAI language model focused on lightweight reasoning and general text generation, accessible via LLM.API.
What is LFM2.5-1.2B-Thinking (free) best suited for?

It is best for low-cost reasoning, code helpers, lightweight agents, and fast iterative text generation where small-model efficiency matters.
How is LFM2.5-1.2B-Thinking (free) priced on LLM.API?

The model is available in a free tier on LLM.API, with zero per-token charge but potential rate limits and quotas.
What is the context window of LFM2.5-1.2B-Thinking (free)?

LFM2.5-1.2B-Thinking (free) should be treated as supporting a short to medium context window suitable for typical chat and tooling prompts.
How fast is LFM2.5-1.2B-Thinking (free) in terms of latency and throughput?

As a 1.2B-parameter model, it generally offers low latency and high throughput compared to larger LLMs on LLM.API.
Which modalities does LFM2.5-1.2B-Thinking (free) support?

This model supports text-only input and output; it does not natively process images, audio, or other modalities.
How do I call LFM2.5-1.2B-Thinking (free) through LLM.API?

Specify the model name "LFM2.5-1.2B-Thinking" in your LLM.API completion or chat endpoint request using your LLM.API key.
How does LFM2.5-1.2B-Thinking (free) compare to larger reasoning models on LLM.API?

It is cheaper and faster but offers weaker long-context reasoning, nuanced understanding, and complex coding support than larger frontier models.
What are the main limitations of LFM2.5-1.2B-Thinking (free)?

Limitations include shallower reasoning on complex tasks, a smaller effective context window, and potentially less robust performance on niche or highly technical domains.
Are there usage limits or rate caps for LFM2.5-1.2B-Thinking (free) on LLM.API?

Yes, the free tier may enforce request-per-minute and daily token caps, which can vary by LLM.API account and plan.

Start in 2 lines of code

Get My API Key

LFM2.5-1.2B-Thinking (free)

What is LFM2.5-1.2B-Thinking (free)?

5 Core Capabilities

On-device Reasoning

Multilingual Chat

Low-latency Inference

Text-only Processing

Multilingual Understanding

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Orchestration

Resilient Fallback Flows

Deep LLM Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code