FLUX.2 Klein 4B

Text Generation

FLUX.2 Klein 4B is a compact, 4‑billion‑parameter image generation and editing model from Black Forest Labs, optimized for fast, sub‑second inference on consumer GPUs. It delivers high‑quality visual outputs while unifying text‑to‑image and image‑editing capabilities in a single architecture.

Start Using API

API Performance

Latency: ~6.0s avg image generation time
Context: 1440px max resolution (longest side)
Input: Free per image prompt
Output: Free per generated image
Uptime: 99% 99%

About the model

What is FLUX.2 Klein 4B?

FLUX.2 Klein 4B is a 4B-parameter rectified-flow transformer model by Black Forest Labs for high-quality, low-latency image generation and editing on consumer hardware. It is mainly used for text-to-image creation in interactive applications where sub-second response and good visual fidelity are important. It is also widely used for single- and multi-reference image editing workflows, including LoRA-based personalization and fine-tuning-friendly setups. The model is part of the FLUX.2 [klein] family, a fast, compact branch of the broader FLUX.2 image-generation and editing models.

Input / Output

Input

Text prompts for image generation and editing
Input images for image-to-image editing and multi-reference conditioning

Output

Generated or edited images (e.g. JPEG, PNG, WebP)

Model capabilities

5 Core Capabilities

Text-to-image

Generates high-quality images from natural language prompts using a compact 4B-parameter rectified flow transformer architecture.
Image Editing

Edits existing images based on text instructions, enabling transformations, enhancements, and content modifications in a unified pipeline.
Multi-reference Editing

Combines multiple reference images with text prompts to guide style, composition, or subject while preserving visual consistency.
Real-time Inference

Optimized for sub-second image generation and editing on consumer GPUs, supporting interactive and high-volume visual workflows.
Fine-tuning Support

Supports fine-tuning and LoRA-based customization through its base variants, enabling domain-specific or style-specialized image models.

Use cases

6 Most Valuable Use Cases

Real-time ad creatives
Interactive concept art
Fast product mockups
High-volume thumbnailing
Image editing workflows
Multi-reference generation

Transparent pricing

Cost Comparison

LLM API offers the lowest per-image cost and best performance for FLUX.2-class 4B models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~350ms	~120 img/min	99.99%	$0.0006/img	$0.0000/img	1 img up to 1024x1024
Black Forest Labs (Direct)	EU West	~550ms	~60 img/min	~99.9%	~$0.0012/img	$0.0000/img	~1 img up to 1024x1024
Replicate	Global	~700ms	~40 img/min	~99.5%	~$0.0015/img	$0.0000/img	~1 img up to 1024x1024
Together AI	US East	~600ms	~70 img/min	~99.9%	~$0.0013/img	$0.0000/img	~1 img up to 1024x1024

Performance benchmarks

Technical Specifications

Metric	FLUX.2 Klein 4B	Stable Diffusion 3.5 Medium	DALL·E 3 (standard)
Latency per Image	~900ms	~1.1s	~1.3s
Throughput	~40 img/s	~35 img/s	~30 img/s
Max Resolution	1536x1536	1536x1536	1792x1024
Price per Image	$0.020	$0.018	$0.040
Supported Formats	PNG, JPG	PNG, JPG	PNG, JPG
Uptime	99.5%	99.9%	99.9%

30-day usage via LLM API

620M: API requests (30 days)
2.9T: Prompt tokens processed (30 days)
3.4T: Completion tokens generated (30 days)
99.8%: Avg API uptime (30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on latency, cost, or quality — without changing your integration.
One endpoint, every model.
Cost-Aware Orchestration

Control spend with smart tiering, quotas, and policy-based model selection so you always use the cheapest model that still meets requirements.
Optimize every token.
Automatic Provider Fallback

Stay online when a model or provider fails with built-in health checks and seamless failover, no extra logic in your app.
Resilient by default.
End-to-End Observability

Trace every request across models and providers with logs, metrics, and event streams that plug into your existing monitoring stack.
See every token flow.
Task-Level Abstractions

Call high-level tasks like chat, tools, or rerank instead of model-specific APIs, so you can swap models without refactoring.
Code to tasks, not models.
High-Throughput Batch

Process millions of inferences efficiently with batch endpoints that maximize provider throughput while handling retries and rate limits for you.
Scale inference, not ops.

Decision guide

When to Use — When NOT to Use

Use it if...

You need a lightweight, 4B-parameter vision model for cost-efficient image generation.
You need reasonably high-quality images but must stay within tight GPU memory limits.
Your use case involves rapid iteration on image concepts rather than photoreal perfection.
Your use case involves deployment on modest on-premise hardware or edge GPU devices.
You need a compact model for fine-tuning or domain adaptation on limited data.
Your use case involves batch image generation where throughput matters more than peak fidelity.

Avoid if...

You need a large, general-purpose language model for text understanding or generation tasks.
Your workload requires state-of-the-art photorealism rivaling the largest diffusion or video models.
You need robust performance on highly diverse, long-horizon multimodal reasoning or planning tasks.
Your workload requires extremely detailed, high-resolution images for print-grade commercial media.
You need fine-grained control using complex textual instructions, compositional prompts, or scene logic.
Your workload requires integrated text, code, or tool-calling alongside image modeling in one system.

FAQ

Frequently Asked Questions

What is FLUX.2 Klein 4B?

FLUX.2 Klein 4B is a 4B-parameter image generation model from Black Forest Labs optimized for fast, efficient, high-quality image synthesis.
What modalities does FLUX.2 Klein 4B support via LLM.API?

FLUX.2 Klein 4B supports text-to-image generation and image-to-image transformation through the LLM.API image generation endpoints.
What is FLUX.2 Klein 4B best suited for?

FLUX.2 Klein 4B is best for rapid, low-cost image generation where lightweight deployment, iteration speed, and decent visual quality are priorities.
How is FLUX.2 Klein 4B priced on LLM.API?

On LLM.API, FLUX.2 Klein 4B is billed per generated image or image step, with exact pricing defined in the LLM.API model catalog.
How do I access FLUX.2 Klein 4B through the LLM.API?

Call the LLM.API image generation endpoint with the FLUX.2 Klein 4B model identifier and your API key in the Authorization header.
What is the typical latency of FLUX.2 Klein 4B on LLM.API?

Typical text-to-image requests return in a few seconds, depending on resolution, step count, and current LLM.API load.
Does FLUX.2 Klein 4B have a context window like text models?

FLUX.2 Klein 4B does not use a token-based context window; it consumes prompts as text strings and conditioning inputs for image generation.
How does FLUX.2 Klein 4B compare to larger FLUX.2 models?

FLUX.2 Klein 4B generally trades some visual fidelity and detail for significantly lower compute cost, faster responses, and easier deployment.
Are there safety or content limitations when using FLUX.2 Klein 4B on LLM.API?

Yes, FLUX.2 Klein 4B usage is subject to LLM.API safety filters and content policies, which may block disallowed or unsafe generations.
What are key limitations of FLUX.2 Klein 4B?

FLUX.2 Klein 4B may struggle with very fine text rendering, complex multi-object scenes, and ultra-photorealism compared to larger image models.

Start in 2 lines of code

Get My API Key

FLUX.2 Klein 4B

What is FLUX.2 Klein 4B?

5 Core Capabilities

Text-to-image

Image Editing

Multi-reference Editing

Real-time Inference

Fine-tuning Support

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Orchestration

Automatic Provider Fallback

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code