Relace Search

Text Generation

Relace Search is a text-only large language model from Relace optimized for agentic multi-step search over large codebases, using parallel file-inspection tools to return highly relevant files quickly.

Start Using API

API Performance

Latency: ~0.8s avg response
Context: ~8K token context
Input: Free per 1M tokens
Output: Free per 1M tokens
Uptime: 99% 99%

About the model

What is Relace Search?

Relace Search is a Relace large language model designed to explore codebases using 4–12 parallel `view_file` and `grep`-style tools and return the most relevant files to a query. It is mainly used as a subagent in autonomous coding systems to perform high-precision, multi-step search across large repositories, and to feed its findings into an orchestrating “oracle” coding agent. It is also useful for document-heavy workflows that benefit from its 256K-token context window and support for tool use and function calling. Relace Search belongs to Relace’s family of small, fast models built specifically as tools for coding agents and large-codebase retrieval.

Input / Output

Input

Text (prompts, code, natural language)

Output

Text responses (search results over codebases)

Model capabilities

5 Core Capabilities

Agentic Code Search

Uses multiple parallel view_file and grep tools to explore large codebases and return precisely targeted relevant files.
Tool-Aware Orchestration

Acts as a subagent that coordinates with external tools and hands structured findings to a downstream oracle coding agent.
Long-Context Retrieval

Handles up to 256K-token contexts, enabling semantic search across extensive repositories, documentation, and multi-file projects.
Structured Function Calling

Supports tool use and function calling, enabling programmatic integration into automated pipelines and custom developer workflows.
Text-Only Interface

Provides text input and output completions via OpenAI-compatible chat endpoints, without native image or multimodal support.

Use cases

6 Most Valuable Use Cases

Codebase File Retrieval
Agentic Code Search
Refactor Impact Scoping
Bug Context Gathering
Monorepo Navigation Assistant
Tool-Augmented Code Browsing

Transparent pricing

Cost Comparison

LLM API offers the lowest cost and latency for Relace Search–class models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	80ms	120 qps	99.99%	$0.05 per 1M queries	$0.00	Search over ~10M documents
Relace	Global	~120ms	~60 qps	~99.9%	~$0.10 per 1M queries	$0.00	~5M documents per index
Pinecone (similar vector search)	US East	~150ms	~40 qps	99.9%	~$0.20 per 1M queries	$0.00	~10M vectors per index
Weaviate Cloud (similar vector search)	EU West	~160ms	~35 qps	~99.9%	~$0.18 per 1M queries	$0.00	~8M objects per cluster
Qdrant Cloud (similar vector search)	Global	~170ms	~30 qps	~99.9%	~$0.16 per 1M queries	$0.00	~10M vectors per collection

Performance benchmarks

Technical Specifications

Metric	Relace Search	Perplexity Search	You.com Search
Avg Latency	~800ms	~900ms	~1.1s
Context Window	~32K	~32K	~16K
Input Price ($/1M tokens)	~$0.80	~$1.00	~$1.20
Output Price ($/1M tokens)	~$1.20	~$1.50	~$1.80
Max Output Tokens	~4K	~4K	~2K
Throughput	~40 rps	~35 rps	~30 rps
Uptime	~99.5%	~99.0%	~99.0%

30-day usage via LLM API

6.8B: Prompt tokens processed (last 30 days)
9.5M: API requests served (last 30 days)
4.1B: Completion tokens generated (last 30 days)
99.8%: Avg uptime (last 30 days)

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically route each request to the best model across providers based on latency, capabilities, or custom rules—without changing your code or client integration.
One endpoint, every model
Cost-Aware Execution

Set cost ceilings and optimization policies so LLM.API chooses the most cost-effective model per call while preserving quality and performance at scale.
Lower spend, same quality
Resilient Fallbacks

Configure provider and model failover in the platform, not your app. If a model degrades or fails, traffic shifts automatically to healthy alternatives.
No single point of failure
Deep Observability

Get centralized logs, traces, and metrics for every provider and model: latencies, errors, token usage, and cost, with queryable insights for debugging and optimization.
See every token, everywhere
Task-Level Abstractions

Define tasks like chat, summarize, extract, or classify once. LLM.API maps them to the right models and prompts so you ship features instead of glue code.
Think tasks, not models
High-Throughput Batch

Batch thousands of inferences across providers through a single API call, with automatic chunking, retries, and result aggregation tuned for large-scale workloads.
Max throughput, minimal code

Decision guide

When to Use — When NOT to Use

Use it if...

You need an agent to rapidly locate relevant files across very large codebases.
You need multi-step, tool-using search that orchestrates parallel view_file and grep calls.
Your use case involves powering an oracle coding agent with precise code search results.
You need a code-focused search model with a 256K context and large outputs.
Your use case involves building custom agent harnesses that parse structured search responses.
You need OpenAI-compatible API access to specialized code search via platforms like OpenRouter.

Avoid if...

You need a general-purpose chat or reasoning model beyond code search and retrieval.
Your workload requires plug-and-play usage without building an agent harness around tools.
You need the absolute cheapest model for simple keyword search over small repositories.
Your workload requires vision, audio, or multimodal understanding rather than pure text code search.
You need high-quality code generation or refactoring, not just finding relevant source files.
Your workload requires ultra-low-latency single-call responses without parallel tool invocations.

FAQ

Frequently Asked Questions

What is Relace Search?

Relace Search is a search-optimized AI model by Relace designed to retrieve and rank relevant documents over large corpora.
What is Relace Search best suited for?

Relace Search is best for semantic search, retrieval-augmented generation backends, and relevance-ranked document or passage search.
What modalities does Relace Search support?

Relace Search works with text-only inputs and returns structured text-based results, not images, audio, or video.
How is Relace Search priced when used through LLM.API?

Relace Search pricing on LLM.API is usage-based per request or token, and you should check the LLM.API pricing page for current rates.
What is the context window of Relace Search?

Relace Search supports a provider-defined maximum query and document length; consult the LLM.API model card for the latest context window limits.
How fast is Relace Search in terms of latency?

Relace Search typically responds fast enough for interactive applications, but actual latency depends on corpus size, request complexity, and network conditions.
How do I call Relace Search via LLM.API?

You call Relace Search by specifying the Relace Search model name in your LLM.API request and passing your query and optional search parameters.
Can I use Relace Search as the retrieval layer for my RAG system?

Yes, you can use Relace Search to retrieve top relevant documents and feed them into a separate generative model for RAG workflows.
How does Relace Search compare to general-purpose LLMs for search tasks?

Relace Search is specialized for retrieval relevance and ranking, whereas general-purpose LLMs focus on generation and may be less efficient for large-scale search.
What are the main limitations of Relace Search?

Relace Search does not generate long-form answers, relies on the indexed corpus quality, and is limited to text-based search scenarios.

Start in 2 lines of code

Get My API Key

Relace Search

What is Relace Search?

5 Core Capabilities

Agentic Code Search

Tool-Aware Orchestration

Long-Context Retrieval

Structured Function Calling

Text-Only Interface

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Execution

Resilient Fallbacks

Deep Observability

Task-Level Abstractions

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code