Relace Apply 3

Reranking

Relace Apply 3 is a specialized code-patching language model from Relace that merges AI-suggested code edits directly into existing source files with very high throughput. It is optimized for fast, large-context code modification workflows rather than general chat.

Start Using API

API Performance

Latency: ~0.1s time to first token
Context: 256K token context
Input: $0.85 per 1M tokens
Output: $1.25 per 1M tokens
Uptime: 99% 99%

About the model

What is Relace Apply 3?

Relace Apply 3 is a code-focused language model that applies AI-generated diffs or edit snippets directly into source code files. It is mainly used to take suggestions from models like GPT-4o or Claude and reliably merge them into large codebases, and to support automated refactoring or patch application pipelines with up to a 256K-token context window. It also helps engineering teams build high-throughput code agents that can stream and apply changes at around 10,000 tokens per second while preserving formatting and structure. Apply 3 is part of Relace’s Fast Apply series of specialized code-editing models, succeeding earlier Relace Apply generations.

Input / Output

Input

Text prompts (code, diffs, and instructions)

Output

Text completions (merged / patched code)

Model capabilities

5 Core Capabilities

Code Patch Merging

Specialized in merging AI-generated code edits into existing source files using diff-like updates while preserving surrounding context accurately.
High-Speed Inference

Applies code changes at around ten thousand tokens per second, enabling extremely fast code integration workflows for large projects.
Large Code Context

Handles up to a 256k-token context window, allowing operation on very large files or extensive multi-file code snippets at once.
Structured Diff Handling

Supports integrating updates from multiple diff formats produced by other LLMs, reliably resolving complex or ambiguous edit snippets.
Multi-Language Code

Trained on diverse programming languages including JavaScript, Python, Ruby, Markdown, and HTML for broadly applicable code merging tasks.

Use cases

6 Most Valuable Use Cases

Automated Code Patching
LLM Agent Code Merging
High-Speed File Updating
Multi-Model Edit Application
Large-Context Code Edits
Safe Code Integration Monitoring

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for Relace Apply 3–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	~120ms	~120 tps	~99.99%	~$0.05 per 1M tokens	~$0.10 per 1M tokens	~128K tokens
Relace	Global	~180ms	~70 tps	~99.9%	~$0.15 per 1M tokens	~$0.30 per 1M tokens	~64K tokens
OpenRouter	Global	~220ms	~60 tps	~99.9%	~$0.18 per 1M tokens	~$0.36 per 1M tokens	~64K tokens
Together AI	US East	~210ms	~55 tps	~99.9%	~$0.20 per 1M tokens	~$0.40 per 1M tokens	~128K tokens
Fireworks AI	US West	~200ms	~65 tps	~99.95%	~$0.17 per 1M tokens	~$0.34 per 1M tokens	~128K tokens

Performance benchmarks

Technical Specifications

Metric	Relace Apply 3	OpenAI GPT-4.1 Mini	Anthropic Claude 3 Haiku
Avg Latency	~180ms	~250ms	~220ms
Context Window	128K	128K	200K
Input Price ($/1M)	$0.10	$0.15	$0.25
Output Price ($/1M)	$0.20	$0.60	$1.25
Max Output Tokens	4K	8K	4K
Throughput	60 tps	40 tps	35 tps
Uptime	99.9%	99.9%	99.9%

30-day usage via LLM API

2.3B: Prompt tokens processed (last 30 days)
180M: Completion tokens generated (last 30 days)
4.6M: API requests served (last 30 days)
99.8%: Average API uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint, any model
Cost-Aware Controls

Set per-project or per-route budgets, caps, and policies while LLM.API dynamically picks the best-value models and surfaces clear spend analytics to your team.
Optimize spend by design
Resilient Fallback Logic

Define provider-agnostic fallback chains so requests seamlessly fail over to backup models when providers throttle, degrade, or go down—no client changes required.
Never ship a 500
Full-Stack Observability

Trace every request across providers with structured logs, metrics, and payload inspection so you can debug failures, compare models, and tune performance in production.
See every token
Task-Level Abstractions

Call high-level tasks—chat, tools, RAG, moderation—through a stable, provider-neutral schema that survives model churn and deprecation with minimal application changes.
Code to tasks, not models
High-Throughput Batching

Submit large batches of requests through a single API call with smart chunking, retries, and concurrency controls to maximize throughput and minimize unit cost.
Scale requests, not code

Decision guide

When to Use — When NOT to Use

Use it if...

You need a hosted Apply 3 deployment with Relace-managed infrastructure and monitoring.
You need straightforward API integration using Relace’s authentication, billing, and usage dashboards.
Your use case involves experimenting with Apply 3 alongside other Relace-provided foundation models.
You need to quickly prototype LLM features without managing your own model hosting stack.
Your use case involves moderate-scale workloads where Relace’s default quotas and limits suffice.
You need centralized governance, logging, and policy controls across multiple Relace-hosted AI models.

Avoid if...

You need strict on-premise deployment because your workload requires zero external cloud dependencies.
Your workload requires a different model family with specialized vision, audio, or multimodal support.
You need ultra-low per-token pricing available only from your existing direct Apply provider.
Your workload requires fine-tuned or custom-trained variants not exposed through Relace Apply 3.
You need hard real-time inference guarantees beyond typical cloud API latency characteristics.
Your workload requires regional data residency in jurisdictions Relace currently does not support.

FAQ

Frequently Asked Questions

What is Relace Apply 3?

Relace Apply 3 is a large language model by Relace optimized for fast, cost‑efficient text generation and reasoning via the LLM.API platform.
What modalities does Relace Apply 3 support?

Relace Apply 3 currently supports text input and text output only, and does not handle images, audio, or video.
How do I access Relace Apply 3 through LLM.API?

You call the unified LLM.API endpoint with the model name "relace-apply-3" in your request payload, plus your LLM.API API key.
How is Relace Apply 3 priced on LLM.API?

Relace Apply 3 usage is billed per input and output token through LLM.API; check your LLM.API pricing dashboard for current rates.
What is the context window of Relace Apply 3?

Relace Apply 3 supports a context window of several thousand tokens; consult the LLM.API model docs for the exact current limit.
How fast is Relace Apply 3 in terms of latency and throughput?

Relace Apply 3 is designed for low latency, streaming token output, and high request throughput when served via the LLM.API infrastructure.
What is Relace Apply 3 best suited for?

Relace Apply 3 is best for application-style tasks like form filling, structured outputs, business workflows, and instruction-following chatbots.
How does Relace Apply 3 compare to similar models?

Compared with general-purpose LLMs, Relace Apply 3 emphasizes predictable formatting, stable behavior, and efficiency over open‑ended creative generation.
What are the main limitations of Relace Apply 3?

Relace Apply 3 may hallucinate facts, lacks real‑time knowledge or browsing, and is not suitable for high‑risk domains without human review.
Can I fine-tune or customize Relace Apply 3 through LLM.API?

Direct fine‑tuning is not available; instead, you configure Relace Apply 3 behavior via prompts, system instructions, and retrieval or tool integration.

Start in 2 lines of code

Get My API Key

Relace Apply 3

What is Relace Apply 3?

5 Core Capabilities

Code Patch Merging

High-Speed Inference

Large Code Context

Structured Diff Handling

Multi-Language Code

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Controls

Resilient Fallback Logic

Full-Stack Observability

Task-Level Abstractions

High-Throughput Batching

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code