Powered by LiquidAI
LFM2.5-1.2B-Thinking (free)
- Text Generation
LFM2.5-1.2B-Thinking (free) is LiquidAI’s 1.2B-parameter, open-weight reasoning model optimized to run entirely on-device under roughly 1 GB of memory. It focuses on chain-of-thought style “thinking” before answers, bringing lightweight yet capable reasoning to phones, laptops, and edge hardware.
About the model
What is LFM2.5-1.2B-Thinking (free)?
LFM2.5-1.2B-Thinking is a 1.2 billion parameter open‑weight reasoning model from LiquidAI designed to run fully on local devices with under 1 GB of memory. It is mainly used for on-device conversational assistants, lightweight analysis, and instructional tasks that benefit from explicit intermediate reasoning traces while remaining offline-capable. It is also used in edge and embedded scenarios—such as mobile apps, IoT, and in-browser WebGPU demos—where fast, low-cost chain-of-thought reasoning is needed without relying on cloud compute. The model belongs to the LFM2.5 family of hybrid on-device foundation models, which extends the earlier LFM2 architecture with additional pretraining and reinforcement learning for improved reasoning quality.
Model capabilities
5 Core Capabilities
-
On-device Reasoning
Performs chain-of-thought style reasoning entirely on local hardware, generating intermediate thoughts before providing final answers.
-
Multilingual Chat
Supports conversational text generation across multiple languages, enabling interactive dialogue and lightweight assistant-style behaviors on edge devices.
-
Low-latency Inference
Optimized for fast token generation under 1GB memory, suitable for real-time use on phones, laptops, and embedded systems.
-
Text-only Processing
Handles purely textual inputs and outputs, focusing on language understanding and generation without built-in vision or tool use.
-
Multilingual Understanding
Understands and generates text in several languages, including English, Chinese, Arabic, and others, for globally distributed applications.
Use cases
6 Most Valuable Use Cases
- On-device Chat Assistant
- Lightweight Text Generation
- Edge Reasoning Demos
- Mobile AI Prototyping
- Cost-free API Experiments
- Local Inference Benchmarking
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and highest performance access to LFM2.5-class reasoning models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~120 tps | ~99.99% | $0.00 | $0.00 | ~128K tokens |
| LiquidAI | Global | ~180ms | ~80 tps | ~99.9% | $0.00 | $0.00 | ~64K tokens |
| OpenRouter | Global | ~220ms | ~60 tps | ~99.9% | ~$0.20 per 1M tokens | ~$0.40 per 1M tokens | ~64K tokens |
| Together AI | US East | ~210ms | ~75 tps | ~99.9% | ~$0.18 per 1M tokens | ~$0.36 per 1M tokens | ~128K tokens |
| DeepInfra | EU West | ~230ms | ~55 tps | ~99.5% | ~$0.22 per 1M tokens | ~$0.44 per 1M tokens | ~32K tokens |
Performance benchmarks
Technical Specifications
| Metric | LFM2.5-1.2B-Thinking (free) | GPT-4o mini | Claude 3 Haiku |
|---|---|---|---|
| Avg Latency | ~220ms | ~250ms | ~300ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.00 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.00 | $0.60 | $1.25 |
| Max Output Tokens | 4K | 8K | 8K |
| Throughput | ~45 tps | ~40 tps | ~35 tps |
| Uptime | 99.0% | 99.9% | 99.9% |
30-day usage via LLM API
- 7.4B
- Prompt tokens processed (last 30 days)
- 11.8M
- API requests served (last 30 days)
- 9.1B
- Completion tokens generated (last 30 days)
- 680K
- Unique users on free tier (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Intelligent Model Routing
Automatically route each request to the optimal model and provider based on cost, latency, and quality—so you ship faster without hard-coding vendor logic.
One API, any model -
Cost-Aware Orchestration
Mix premium and budget models with configurable policies, caps, and guardrails—minimizing spend while preserving SLA and output quality at scale.
Control spend by design -
Resilient Fallback Flows
Define multi-provider, multi-model fallback chains that automatically retry or downgrade when vendors fail—no more outages or 500s from upstream instability.
Reliability by default -
Deep LLM Observability
Trace every call across providers with logs, metrics, and structured events—making it easy to debug prompts, tune routing, and prove performance to stakeholders.
See every token -
Task-Level Abstractions
Describe tasks like chat, classify, extract, or embed once, and let LLM.API pick and adapt models—so your app logic stays provider-agnostic.
Code to tasks, not models -
High-Throughput Batch
Process millions of inputs via optimized batch pipelines with concurrency controls, backoff, and deduplication—maximizing throughput while respecting provider limits.
Scale workloads effortlessly
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a free reasoning-focused model for prototyping thought-intensive prompts and workflows.
- Your use case involves experimenting with chain-of-thought style prompting on small tasks.
- You need lightweight analytical assistance for short texts, like summaries or quick classifications.
- Your use case involves educational demos of reasoning models without incurring usage costs.
- You need a compact model suitable for low-traffic tools, bots, or assistants.
- Your use case involves batch-running modest reasoning jobs where throughput matters more than perfection.
- You need a reasoning helper embedded in internal tools where occasional mistakes are acceptable.
Avoid if...
- You need frontier-level reasoning quality on complex, multi-hop tasks across large documents.
- Your workload requires handling very long contexts, such as entire books or codebases.
- You need strong performance on nuanced enterprise tasks like legal, medical, or financial analysis.
- Your workload requires state-of-the-art coding assistance, debugging, or large-application refactoring.
- You need highly reliable outputs for safety-critical systems with strict accuracy guarantees.
- Your workload requires strong tool-use orchestration across many APIs and complex workflows.
- You need enterprise-grade reliability, SLAs, and vendor support for mission-critical systems.
FAQ
Frequently Asked Questions
-
What is LFM2.5-1.2B-Thinking (free)?
LFM2.5-1.2B-Thinking (free) is a 1.2B-parameter LiquidAI language model focused on lightweight reasoning and general text generation, accessible via LLM.API.
-
What is LFM2.5-1.2B-Thinking (free) best suited for?
It is best for low-cost reasoning, code helpers, lightweight agents, and fast iterative text generation where small-model efficiency matters.
-
How is LFM2.5-1.2B-Thinking (free) priced on LLM.API?
The model is available in a free tier on LLM.API, with zero per-token charge but potential rate limits and quotas.
-
What is the context window of LFM2.5-1.2B-Thinking (free)?
LFM2.5-1.2B-Thinking (free) should be treated as supporting a short to medium context window suitable for typical chat and tooling prompts.
-
How fast is LFM2.5-1.2B-Thinking (free) in terms of latency and throughput?
As a 1.2B-parameter model, it generally offers low latency and high throughput compared to larger LLMs on LLM.API.
-
Which modalities does LFM2.5-1.2B-Thinking (free) support?
This model supports text-only input and output; it does not natively process images, audio, or other modalities.
-
How do I call LFM2.5-1.2B-Thinking (free) through LLM.API?
Specify the model name "LFM2.5-1.2B-Thinking" in your LLM.API completion or chat endpoint request using your LLM.API key.
-
How does LFM2.5-1.2B-Thinking (free) compare to larger reasoning models on LLM.API?
It is cheaper and faster but offers weaker long-context reasoning, nuanced understanding, and complex coding support than larger frontier models.
-
What are the main limitations of LFM2.5-1.2B-Thinking (free)?
Limitations include shallower reasoning on complex tasks, a smaller effective context window, and potentially less robust performance on niche or highly technical domains.
-
Are there usage limits or rate caps for LFM2.5-1.2B-Thinking (free) on LLM.API?
Yes, the free tier may enforce request-per-minute and daily token caps, which can vary by LLM.API account and plan.
