Powered by LiquidAI

LFM2.5-1.2B-Instruct (free)

  • Instruction Following

LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter, instruction-tuned hybrid language model from LiquidAI, optimized for fast, on-device inference with a ~32k token context window. It offers general-purpose conversational and task-oriented capabilities while running efficiently on edge hardware.

Start Using API

What is LFM2.5-1.2B-Instruct (free)?

LFM2.5-1.2B-Instruct (free) is a compact, instruction-tuned text-generation model from LiquidAI designed for fast, on-device AI with a context window of roughly 32k tokens. It is mainly used for general-purpose chat, agentic workflows, data extraction, and retrieval-augmented generation where low latency and small memory footprint are important. The model is also positioned for multi-language conversational tasks across several major languages, though it is not recommended as a top choice for highly knowledge-intensive or advanced programming workloads. It belongs to the LFM2.5 family of hybrid on-device models, building on the earlier LFM2 architecture with extended pre-training and reinforcement learning-based post-training.

5 Core Capabilities

  • Conversational Chat

    Instruction-tuned chat model supporting multi-turn dialogue, general assistance, and natural conversation with strong instruction-following behavior.

  • Text Generation

    Generates coherent, context-aware text for prompts, explanations, and open-ended tasks using a 1.2B-parameter on-device-optimized architecture.

  • Multilingual Support

    Understands and generates text in multiple languages, including English, Arabic, Chinese, and several others, for diverse global use cases.

  • Tool and Function Use

    Supports structured outputs, function calling, and tool use, enabling integration into agentic pipelines and automation workflows.

  • Edge Deployment

    Designed for fast, low-memory inference on CPUs and NPUs, enabling on-device AI experiences on laptops, mobiles, and IoT hardware.

6 Most Valuable Use Cases

  • On-device AI Chat
  • Mobile Task Assistance
  • Edge Data Extraction
  • Lightweight Text Analysis
  • RAG Answer Generation
  • CPU-Optimized Inference

Cost Comparison

LLM API offers the lowest per-token cost and best performance for LFM2.5-class instruct models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120 tps 99.99% $0.02 $0.02 64K tokens
LiquidAI Global ~180ms ~40 tps ~99.9% $0.00 $0.00 ~32K tokens
OpenAI (GPT-4o-mini-equivalent) Global ~220ms ~60 tps 99.9% ~$0.15 ~$0.60 128K tokens
Anthropic (Claude 3 Haiku-equivalent) US East ~250ms ~50 tps 99.9% ~$0.20 ~$0.80 200K tokens
Google (Gemini 1.5 Flash-equivalent) Global ~210ms ~70 tps 99.9% ~$0.12 ~$0.48 1M tokens

Technical Specifications

Metric LFM2.5-1.2B-Instruct (free) Llama 3.2 1B Instruct Gemma 2 2B Instruct
Avg Latency ~220ms ~250ms ~260ms
Context Window 16K 16K 8K
Input Price ($/1M) $0.00 $0.10 $0.09
Output Price ($/1M) $0.00 $0.15 $0.12
Max Output Tokens 4K 4K 4K
Throughput ~60 tps ~55 tps ~50 tps
Uptime 99.5% 99.9% 99.9%

30-day usage via LLM API

1.8B
Prompt tokens processed (last 30 days)
320M
Completion tokens generated (last 30 days)
4.6M
API requests served (last 30 days)
410K
Unique users (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the best model across providers based on latency, cost, and quality—without touching your app code.

    One endpoint, every model
  • Cost-Aware Optimization

    Dynamically pick cheaper equivalent models, control spend with policy-based limits, and monitor per-project usage so you never get surprised by your AI bill.

    Cut spend, keep quality
  • Resilient Fallbacks

    Configure automatic failover to backup models and providers when requests fail or time out, keeping your AI features online even during provider outages.

    No single point of failure
  • Deep Observability

    Get full visibility into every call—latency, errors, tokens, and model choices—with logs and traces that plug into your existing monitoring stack.

    See every token and trace
  • Task-Level Abstractions

    Define high-level tasks—chat, classification, extraction, tools—once and let LLM.API pick and orchestrate the right models and prompts for each job.

    Code to tasks, not models
  • High-Throughput Batch

    Process millions of inputs efficiently with optimized batching, concurrency controls, and retry semantics tailored for large-scale offline and backfill workloads.

    Scale from 10 to millions

When to Use — When NOT to Use

Use it if...

  • You need a free, small-footprint instruct model for light-weight experimentation and prototyping.
  • Your use case involves simple Q&A, definitions, or short factual clarifications on common topics.
  • You need a compact model suitable for on-device or low-resource server deployments.
  • Your use case involves generating short emails, messages, or template-based business text.
  • You need a model to assist with basic code snippets or minor refactoring tasks.
  • Your use case involves educational examples or demos where cutting-edge capability is unnecessary.
  • You need a backup or fallback model when larger, paid models are unavailable.

Avoid if...

  • You need state-of-the-art reasoning, planning, or complex multi-step chain-of-thought solutions.
  • Your workload requires handling very long documents, transcripts, or multi-document context windows.
  • You need highly reliable, domain-expert outputs for medical, legal, or financial decisions.
  • Your workload requires advanced coding assistance across large repositories and complex software architectures.
  • You need high-quality creative writing, nuanced style control, or sophisticated story generation.
  • Your workload requires robust tool-use, API orchestration, or complex multi-agent system coordination.
  • You need strong multilingual performance or translation quality across many low-resource languages.

Frequently Asked Questions

  • What is LFM2.5-1.2B-Instruct (free)?

    LFM2.5-1.2B-Instruct (free) is a 1.2B-parameter LiquidAI instruction-tuned language model optimized for fast, low-cost text generation via LLM.API.

  • What is LFM2.5-1.2B-Instruct (free) best suited for?

    It is best for lightweight chatbots, tool-using agents, code helpers, and simple reasoning tasks where low latency and free usage are more important than peak accuracy.

  • How is LFM2.5-1.2B-Instruct (free) priced on LLM.API?

    The model is available in a free tier on LLM.API, meaning requests are not directly metered by tokens but may be subject to fair-use limits.

  • What is the context window of LFM2.5-1.2B-Instruct (free)?

    LFM2.5-1.2B-Instruct (free) supports a context window of up to 8,192 tokens per request on LLM.API.

  • What modalities does LFM2.5-1.2B-Instruct (free) support?

    This model is text-only, accepting text prompts and returning text completions without native image, audio, or video understanding.

  • How fast is LFM2.5-1.2B-Instruct (free) on LLM.API?

    Being a 1.2B-parameter model, it is optimized for low latency and generally responds faster than larger LiquidAI or frontier models under similar conditions.

  • How do I call LFM2.5-1.2B-Instruct (free) through LLM.API?

    Specify the model name "liquidai/lfm2.5-1.2b-instruct-free" (or the documented identifier) in your LLM.API completion or chat endpoint request.

  • How does LFM2.5-1.2B-Instruct (free) compare to larger LiquidAI or frontier models?

    It is cheaper and faster but has weaker long-context reasoning, creativity, and coding depth than larger LiquidAI or state-of-the-art models.

  • Does LFM2.5-1.2B-Instruct (free) support tools or function calling via LLM.API?

    You can use it with LLM.API’s tool-calling layer, but the model itself does not implement a native structured tool-calling protocol.

  • What are the main limitations of LFM2.5-1.2B-Instruct (free)?

    It can hallucinate facts, struggle with complex multi-step reasoning, and may perform poorly on very long documents compared to larger models.

Start in 2 lines of code

Get My API Key