Grok 4.20

Text Generation

Grok 4.20 is xAI’s flagship large language model designed for high-speed inference, low hallucination rates, and strong agentic tool-calling for complex tasks.

Start Using API

API Performance

Latency: ~1.3s avg response
Context: ~128K token context
Input: ~$2.50 per 1M tokens
Output: ~$7.50 per 1M tokens
Uptime: 99% 99%

About the model

What is Grok 4.20?

Grok 4.20 is a flagship large language model from xAI focused on fast, reliable reasoning with multiple internal agents to improve answer quality. It is primarily used for advanced chat-based assistance, complex reasoning tasks, and agentic workflows where it can orchestrate tools and APIs. It is also deployed in enterprise and developer platforms via APIs and partner integrations for building applications that need structured output, function calling, and multimodal (text and image) understanding. It succeeds earlier Grok 4-series models and builds on the broader Grok family of xAI language models.

Input / Output

Input

Text prompts (natural language, code, instructions)
Images (for multimodal understanding and reasoning)

Output

Text responses (answers, explanations, reasoning traces)
Code snippets and programming outputs

Model capabilities

5 Core Capabilities

Conversational Chat

Engages in multi-turn dialogue, answering questions and following instructions with contextual awareness and controllable tone and style.
Code and Tools

Understands and generates code snippets, and can reason about using external tools or APIs when appropriately integrated.
Image Reasoning

Interprets images to identify objects and visual patterns, supporting question answering and basic visual understanding tasks.
Text Translation

Translates between multiple major languages while maintaining meaning and style across diverse informal and formal text inputs.
Text Extraction

Extracts readable text and structured information from documents or images, enabling downstream processing and analysis.

Use cases

6 Most Valuable Use Cases

Enterprise AI Assistance
Customer Support Chatbots
Code Generation & Debugging
Multimodal Content Analysis
Tool-Using AI Agents
Knowledge Base Creation

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and best performance for Grok‑class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	80 tps	99.99%	$0.30	$0.60	128K
xAI	Global	~450ms	~35 tps	~99.9%	~$0.80	~$1.60	~128K
OpenAI	Global	~400ms	~40 tps	~99.9%	~$0.75	~$1.50	~128K
Anthropic	US East	~420ms	~30 tps	~99.9%	~$0.85	~$1.70	~200K

Performance benchmarks

Technical Specifications

Metric	Grok 4.20 (xAI)	GPT-4.1 (OpenAI)	Claude 3.5 Sonnet (Anthropic)
Avg Latency	~700ms	~900ms	~850ms
Context Window	128K	128K	200K
Input Price ($/1M)	$2.00	$5.00	$3.00
Output Price ($/1M)	$5.00	$15.00	$15.00
Max Output Tokens	8K	4K	4K
Throughput	40 tps	30 tps	25 tps
Uptime	99.5%	99.9%	99.9%

30-day usage via LLM API

62B: Prompt tokens processed (last 30 days)
41B: Completion tokens generated (last 30 days)
11.4M: API requests served (30 days)
99.6%: Avg uptime over last 30 days

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Intelligent Model Routing

Automatically route each request to the best model across providers based on latency, cost, and quality—without changing your integration or application code.
One endpoint, all models
Cost-Aware Orchestration

Enforce budgets, compare provider pricing, and downgrade to cheaper models when quality thresholds are met so you never overspend on inference again.
Control spend by design
Resilient Fallback Flows

Define failover chains so requests automatically retry on alternative models or providers, preventing downtime and degraded UX when a single vendor has issues.
No single point of failure
End-to-End Observability

Capture structured logs, metrics, and traces for every call across providers, making it easy to debug failures, tune prompts, and optimize performance in production.
See every token, everywhere
Task-Level Abstractions

Describe tasks like chat, completion, tools, or rerank once and let LLM.API pick the right models and parameters for each use case automatically.
Think in tasks, not models
High-Throughput Batch Jobs

Run massive, parallel LLM workloads with built-in queuing, rate-limit handling, and retries so you can process millions of items reliably and cost-effectively.
Scale jobs without glue code

Decision guide

When to Use — When NOT to Use

Use it if...

You need cutting-edge reasoning and coding from a frontier model by xAI.
You need strong performance on complex analytical tasks, including math, logic, and troubleshooting.
You need a general-purpose assistant for chat, drafting, and summarization across many domains.
Your use case involves exploratory research where up-to-date web-connected intelligence is beneficial.
Your use case involves building developer tools or agents that rely on advanced reasoning.

Avoid if...

You need strict, audited enterprise compliance guarantees that xAI has not formally documented.
You need a model with long-standing production track record and mature enterprise support.
You need specialized vision, audio, or multimodal capabilities beyond standard text-based interactions.
Your workload requires deterministic, reproducible outputs guaranteed by stable, version-locked APIs.
Your workload requires guarantees around jurisdiction-specific data residency and regional processing controls.

FAQ

Frequently Asked Questions

What is Grok 4.20?

Grok 4.20 is an xAI large language model accessible via LLM.API, designed for fast, general-purpose code, chat, and analysis workloads.
What is Grok 4.20 best suited for?

Grok 4.20 is best for conversational agents, code assistance, data analysis, and iterative reasoning where low latency and strong general capabilities matter.
What is the context window of Grok 4.20 on LLM.API?

Grok 4.20 supports up to a 128K token context window when accessed through LLM.API.
How is Grok 4.20 priced on LLM.API?

Grok 4.20 pricing is set by LLM.API and may differ from xAI direct pricing; check your LLM.API dashboard for current per-token rates.
How fast is Grok 4.20 in terms of latency and throughput?

Grok 4.20 is optimized on LLM.API for low p95 latency and streaming responses suitable for interactive applications, subject to your chosen deployment region.
Which modalities does Grok 4.20 support via LLM.API?

Grok 4.20 supports text input and text output only when used through LLM.API.
How do I call Grok 4.20 through the LLM.API gateway?

Use the LLM.API chat or completions endpoint with the model identifier "grok-4.20" and your LLM.API key in the Authorization header.
How does Grok 4.20 compare to similar frontier models?

Grok 4.20 targets competitive reasoning and coding quality at generally lower cost and latency than many flagship general-purpose models on LLM.API.
What are the main limitations of Grok 4.20?

Grok 4.20 can hallucinate facts, lacks real-time knowledge, and should not be solely relied on for safety-critical, legal, or medical decisions.
Does Grok 4.20 support tools or function calling on LLM.API?

Yes, Grok 4.20 can use LLM.API’s tool or function-calling interface when you define tools in the request schema.
Can I use Grok 4.20 for long-running batch jobs?

Yes, Grok 4.20 can be used for batch processing through LLM.API, but you must respect rate limits and maximum tokens per request.

Start in 2 lines of code

Get My API Key

Grok 4.20

What is Grok 4.20?

5 Core Capabilities

Conversational Chat

Code and Tools

Image Reasoning

Text Translation

Text Extraction

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Intelligent Model Routing

Cost-Aware Orchestration

Resilient Fallback Flows

End-to-End Observability

Task-Level Abstractions

High-Throughput Batch Jobs

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code