Powered by Relace

Relace Apply 3

  • Reranking

Relace Apply 3 is a specialized code-patching language model from Relace that merges AI-suggested code edits directly into existing source files with very high throughput. It is optimized for fast, large-context code modification workflows rather than general chat.

Start Using API

What is Relace Apply 3?

Relace Apply 3 is a code-focused language model that applies AI-generated diffs or edit snippets directly into source code files. It is mainly used to take suggestions from models like GPT-4o or Claude and reliably merge them into large codebases, and to support automated refactoring or patch application pipelines with up to a 256K-token context window. It also helps engineering teams build high-throughput code agents that can stream and apply changes at around 10,000 tokens per second while preserving formatting and structure. Apply 3 is part of Relace’s Fast Apply series of specialized code-editing models, succeeding earlier Relace Apply generations.

5 Core Capabilities

  • Code Patch Merging

    Specialized in merging AI-generated code edits into existing source files using diff-like updates while preserving surrounding context accurately.

  • High-Speed Inference

    Applies code changes at around ten thousand tokens per second, enabling extremely fast code integration workflows for large projects.

  • Large Code Context

    Handles up to a 256k-token context window, allowing operation on very large files or extensive multi-file code snippets at once.

  • Structured Diff Handling

    Supports integrating updates from multiple diff formats produced by other LLMs, reliably resolving complex or ambiguous edit snippets.

  • Multi-Language Code

    Trained on diverse programming languages including JavaScript, Python, Ruby, Markdown, and HTML for broadly applicable code merging tasks.

6 Most Valuable Use Cases

  • Automated Code Patching
  • LLM Agent Code Merging
  • High-Speed File Updating
  • Multi-Model Edit Application
  • Large-Context Code Edits
  • Safe Code Integration Monitoring

Cost Comparison

LLM API offers the lowest cost and latency for Relace Apply 3–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global ~120ms ~120 tps ~99.99% ~$0.05 per 1M tokens ~$0.10 per 1M tokens ~128K tokens
Relace Global ~180ms ~70 tps ~99.9% ~$0.15 per 1M tokens ~$0.30 per 1M tokens ~64K tokens
OpenRouter Global ~220ms ~60 tps ~99.9% ~$0.18 per 1M tokens ~$0.36 per 1M tokens ~64K tokens
Together AI US East ~210ms ~55 tps ~99.9% ~$0.20 per 1M tokens ~$0.40 per 1M tokens ~128K tokens
Fireworks AI US West ~200ms ~65 tps ~99.95% ~$0.17 per 1M tokens ~$0.34 per 1M tokens ~128K tokens

Technical Specifications

Metric Relace Apply 3 OpenAI GPT-4.1 Mini Anthropic Claude 3 Haiku
Avg Latency ~180ms ~250ms ~220ms
Context Window 128K 128K 200K
Input Price ($/1M) $0.10 $0.15 $0.25
Output Price ($/1M) $0.20 $0.60 $1.25
Max Output Tokens 4K 8K 4K
Throughput 60 tps 40 tps 35 tps
Uptime 99.9% 99.9% 99.9%

30-day usage via LLM API

2.3B
Prompt tokens processed (last 30 days)
180M
Completion tokens generated (last 30 days)
4.6M
API requests served (last 30 days)
99.8%
Average API uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.

    One endpoint, any model
  • Cost-Aware Controls

    Set per-project or per-route budgets, caps, and policies while LLM.API dynamically picks the best-value models and surfaces clear spend analytics to your team.

    Optimize spend by design
  • Resilient Fallback Logic

    Define provider-agnostic fallback chains so requests seamlessly fail over to backup models when providers throttle, degrade, or go down—no client changes required.

    Never ship a 500
  • Full-Stack Observability

    Trace every request across providers with structured logs, metrics, and payload inspection so you can debug failures, compare models, and tune performance in production.

    See every token
  • Task-Level Abstractions

    Call high-level tasks—chat, tools, RAG, moderation—through a stable, provider-neutral schema that survives model churn and deprecation with minimal application changes.

    Code to tasks, not models
  • High-Throughput Batching

    Submit large batches of requests through a single API call with smart chunking, retries, and concurrency controls to maximize throughput and minimize unit cost.

    Scale requests, not code

When to Use — When NOT to Use

Use it if...

  • You need a hosted Apply 3 deployment with Relace-managed infrastructure and monitoring.
  • You need straightforward API integration using Relace’s authentication, billing, and usage dashboards.
  • Your use case involves experimenting with Apply 3 alongside other Relace-provided foundation models.
  • You need to quickly prototype LLM features without managing your own model hosting stack.
  • Your use case involves moderate-scale workloads where Relace’s default quotas and limits suffice.
  • You need centralized governance, logging, and policy controls across multiple Relace-hosted AI models.

Avoid if...

  • You need strict on-premise deployment because your workload requires zero external cloud dependencies.
  • Your workload requires a different model family with specialized vision, audio, or multimodal support.
  • You need ultra-low per-token pricing available only from your existing direct Apply provider.
  • Your workload requires fine-tuned or custom-trained variants not exposed through Relace Apply 3.
  • You need hard real-time inference guarantees beyond typical cloud API latency characteristics.
  • Your workload requires regional data residency in jurisdictions Relace currently does not support.

Frequently Asked Questions

  • What is Relace Apply 3?

    Relace Apply 3 is a large language model by Relace optimized for fast, cost‑efficient text generation and reasoning via the LLM.API platform.

  • What modalities does Relace Apply 3 support?

    Relace Apply 3 currently supports text input and text output only, and does not handle images, audio, or video.

  • How do I access Relace Apply 3 through LLM.API?

    You call the unified LLM.API endpoint with the model name "relace-apply-3" in your request payload, plus your LLM.API API key.

  • How is Relace Apply 3 priced on LLM.API?

    Relace Apply 3 usage is billed per input and output token through LLM.API; check your LLM.API pricing dashboard for current rates.

  • What is the context window of Relace Apply 3?

    Relace Apply 3 supports a context window of several thousand tokens; consult the LLM.API model docs for the exact current limit.

  • How fast is Relace Apply 3 in terms of latency and throughput?

    Relace Apply 3 is designed for low latency, streaming token output, and high request throughput when served via the LLM.API infrastructure.

  • What is Relace Apply 3 best suited for?

    Relace Apply 3 is best for application-style tasks like form filling, structured outputs, business workflows, and instruction-following chatbots.

  • How does Relace Apply 3 compare to similar models?

    Compared with general-purpose LLMs, Relace Apply 3 emphasizes predictable formatting, stable behavior, and efficiency over open‑ended creative generation.

  • What are the main limitations of Relace Apply 3?

    Relace Apply 3 may hallucinate facts, lacks real‑time knowledge or browsing, and is not suitable for high‑risk domains without human review.

  • Can I fine-tune or customize Relace Apply 3 through LLM.API?

    Direct fine‑tuning is not available; instead, you configure Relace Apply 3 behavior via prompts, system instructions, and retrieval or tool integration.

Start in 2 lines of code

Get My API Key