Powered by Relace
Relace Apply 3
- Reranking
Relace Apply 3 is a specialized code-patching language model from Relace that merges AI-suggested code edits directly into existing source files with very high throughput. It is optimized for fast, large-context code modification workflows rather than general chat.
About the model
What is Relace Apply 3?
Relace Apply 3 is a code-focused language model that applies AI-generated diffs or edit snippets directly into source code files. It is mainly used to take suggestions from models like GPT-4o or Claude and reliably merge them into large codebases, and to support automated refactoring or patch application pipelines with up to a 256K-token context window. It also helps engineering teams build high-throughput code agents that can stream and apply changes at around 10,000 tokens per second while preserving formatting and structure. Apply 3 is part of Relace’s Fast Apply series of specialized code-editing models, succeeding earlier Relace Apply generations.
Model capabilities
5 Core Capabilities
-
Code Patch Merging
Specialized in merging AI-generated code edits into existing source files using diff-like updates while preserving surrounding context accurately.
-
High-Speed Inference
Applies code changes at around ten thousand tokens per second, enabling extremely fast code integration workflows for large projects.
-
Large Code Context
Handles up to a 256k-token context window, allowing operation on very large files or extensive multi-file code snippets at once.
-
Structured Diff Handling
Supports integrating updates from multiple diff formats produced by other LLMs, reliably resolving complex or ambiguous edit snippets.
-
Multi-Language Code
Trained on diverse programming languages including JavaScript, Python, Ruby, Markdown, and HTML for broadly applicable code merging tasks.
Use cases
6 Most Valuable Use Cases
- Automated Code Patching
- LLM Agent Code Merging
- High-Speed File Updating
- Multi-Model Edit Application
- Large-Context Code Edits
- Safe Code Integration Monitoring
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and latency for Relace Apply 3–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | ~120ms | ~120 tps | ~99.99% | ~$0.05 per 1M tokens | ~$0.10 per 1M tokens | ~128K tokens |
| Relace | Global | ~180ms | ~70 tps | ~99.9% | ~$0.15 per 1M tokens | ~$0.30 per 1M tokens | ~64K tokens |
| OpenRouter | Global | ~220ms | ~60 tps | ~99.9% | ~$0.18 per 1M tokens | ~$0.36 per 1M tokens | ~64K tokens |
| Together AI | US East | ~210ms | ~55 tps | ~99.9% | ~$0.20 per 1M tokens | ~$0.40 per 1M tokens | ~128K tokens |
| Fireworks AI | US West | ~200ms | ~65 tps | ~99.95% | ~$0.17 per 1M tokens | ~$0.34 per 1M tokens | ~128K tokens |
Performance benchmarks
Technical Specifications
| Metric | Relace Apply 3 | OpenAI GPT-4.1 Mini | Anthropic Claude 3 Haiku |
|---|---|---|---|
| Avg Latency | ~180ms | ~250ms | ~220ms |
| Context Window | 128K | 128K | 200K |
| Input Price ($/1M) | $0.10 | $0.15 | $0.25 |
| Output Price ($/1M) | $0.20 | $0.60 | $1.25 |
| Max Output Tokens | 4K | 8K | 4K |
| Throughput | 60 tps | 40 tps | 35 tps |
| Uptime | 99.9% | 99.9% | 99.9% |
30-day usage via LLM API
- 2.3B
- Prompt tokens processed (last 30 days)
- 180M
- Completion tokens generated (last 30 days)
- 4.6M
- API requests served (last 30 days)
- 99.8%
- Average API uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint, any model -
Cost-Aware Controls
Set per-project or per-route budgets, caps, and policies while LLM.API dynamically picks the best-value models and surfaces clear spend analytics to your team.
Optimize spend by design -
Resilient Fallback Logic
Define provider-agnostic fallback chains so requests seamlessly fail over to backup models when providers throttle, degrade, or go down—no client changes required.
Never ship a 500 -
Full-Stack Observability
Trace every request across providers with structured logs, metrics, and payload inspection so you can debug failures, compare models, and tune performance in production.
See every token -
Task-Level Abstractions
Call high-level tasks—chat, tools, RAG, moderation—through a stable, provider-neutral schema that survives model churn and deprecation with minimal application changes.
Code to tasks, not models -
High-Throughput Batching
Submit large batches of requests through a single API call with smart chunking, retries, and concurrency controls to maximize throughput and minimize unit cost.
Scale requests, not code
Decision guide
When to Use — When NOT to Use
Use it if...
- You need a hosted Apply 3 deployment with Relace-managed infrastructure and monitoring.
- You need straightforward API integration using Relace’s authentication, billing, and usage dashboards.
- Your use case involves experimenting with Apply 3 alongside other Relace-provided foundation models.
- You need to quickly prototype LLM features without managing your own model hosting stack.
- Your use case involves moderate-scale workloads where Relace’s default quotas and limits suffice.
- You need centralized governance, logging, and policy controls across multiple Relace-hosted AI models.
Avoid if...
- You need strict on-premise deployment because your workload requires zero external cloud dependencies.
- Your workload requires a different model family with specialized vision, audio, or multimodal support.
- You need ultra-low per-token pricing available only from your existing direct Apply provider.
- Your workload requires fine-tuned or custom-trained variants not exposed through Relace Apply 3.
- You need hard real-time inference guarantees beyond typical cloud API latency characteristics.
- Your workload requires regional data residency in jurisdictions Relace currently does not support.
FAQ
Frequently Asked Questions
-
What is Relace Apply 3?
Relace Apply 3 is a large language model by Relace optimized for fast, cost‑efficient text generation and reasoning via the LLM.API platform.
-
What modalities does Relace Apply 3 support?
Relace Apply 3 currently supports text input and text output only, and does not handle images, audio, or video.
-
How do I access Relace Apply 3 through LLM.API?
You call the unified LLM.API endpoint with the model name "relace-apply-3" in your request payload, plus your LLM.API API key.
-
How is Relace Apply 3 priced on LLM.API?
Relace Apply 3 usage is billed per input and output token through LLM.API; check your LLM.API pricing dashboard for current rates.
-
What is the context window of Relace Apply 3?
Relace Apply 3 supports a context window of several thousand tokens; consult the LLM.API model docs for the exact current limit.
-
How fast is Relace Apply 3 in terms of latency and throughput?
Relace Apply 3 is designed for low latency, streaming token output, and high request throughput when served via the LLM.API infrastructure.
-
What is Relace Apply 3 best suited for?
Relace Apply 3 is best for application-style tasks like form filling, structured outputs, business workflows, and instruction-following chatbots.
-
How does Relace Apply 3 compare to similar models?
Compared with general-purpose LLMs, Relace Apply 3 emphasizes predictable formatting, stable behavior, and efficiency over open‑ended creative generation.
-
What are the main limitations of Relace Apply 3?
Relace Apply 3 may hallucinate facts, lacks real‑time knowledge or browsing, and is not suitable for high‑risk domains without human review.
-
Can I fine-tune or customize Relace Apply 3 through LLM.API?
Direct fine‑tuning is not available; instead, you configure Relace Apply 3 behavior via prompts, system instructions, and retrieval or tool integration.
