Powered by Relace
Relace Search
- Text Generation
Relace Search is a text-only large language model from Relace optimized for agentic multi-step search over large codebases, using parallel file-inspection tools to return highly relevant files quickly.
About the model
What is Relace Search?
Relace Search is a Relace large language model designed to explore codebases using 4–12 parallel `view_file` and `grep`-style tools and return the most relevant files to a query. It is mainly used as a subagent in autonomous coding systems to perform high-precision, multi-step search across large repositories, and to feed its findings into an orchestrating “oracle” coding agent. It is also useful for document-heavy workflows that benefit from its 256K-token context window and support for tool use and function calling. Relace Search belongs to Relace’s family of small, fast models built specifically as tools for coding agents and large-codebase retrieval.
Model capabilities
5 Core Capabilities
-
Agentic Code Search
Uses multiple parallel view_file and grep tools to explore large codebases and return precisely targeted relevant files.
-
Tool-Aware Orchestration
Acts as a subagent that coordinates with external tools and hands structured findings to a downstream oracle coding agent.
-
Long-Context Retrieval
Handles up to 256K-token contexts, enabling semantic search across extensive repositories, documentation, and multi-file projects.
-
Structured Function Calling
Supports tool use and function calling, enabling programmatic integration into automated pipelines and custom developer workflows.
-
Text-Only Interface
Provides text input and output completions via OpenAI-compatible chat endpoints, without native image or multimodal support.
Use cases
6 Most Valuable Use Cases
- Codebase File Retrieval
- Agentic Code Search
- Refactor Impact Scoping
- Bug Context Gathering
- Monorepo Navigation Assistant
- Tool-Augmented Code Browsing
Transparent pricing
Cost Comparison
LLM API offers the lowest cost and latency for Relace Search–class models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 80ms | 120 qps | 99.99% | $0.05 per 1M queries | $0.00 | Search over ~10M documents |
| Relace | Global | ~120ms | ~60 qps | ~99.9% | ~$0.10 per 1M queries | $0.00 | ~5M documents per index |
| Pinecone (similar vector search) | US East | ~150ms | ~40 qps | 99.9% | ~$0.20 per 1M queries | $0.00 | ~10M vectors per index |
| Weaviate Cloud (similar vector search) | EU West | ~160ms | ~35 qps | ~99.9% | ~$0.18 per 1M queries | $0.00 | ~8M objects per cluster |
| Qdrant Cloud (similar vector search) | Global | ~170ms | ~30 qps | ~99.9% | ~$0.16 per 1M queries | $0.00 | ~10M vectors per collection |
Performance benchmarks
Technical Specifications
| Metric | Relace Search | Perplexity Search | You.com Search |
|---|---|---|---|
| Avg Latency | ~800ms | ~900ms | ~1.1s |
| Context Window | ~32K | ~32K | ~16K |
| Input Price ($/1M tokens) | ~$0.80 | ~$1.00 | ~$1.20 |
| Output Price ($/1M tokens) | ~$1.20 | ~$1.50 | ~$1.80 |
| Max Output Tokens | ~4K | ~4K | ~2K |
| Throughput | ~40 rps | ~35 rps | ~30 rps |
| Uptime | ~99.5% | ~99.0% | ~99.0% |
30-day usage via LLM API
- 6.8B
- Prompt tokens processed (last 30 days)
- 9.5M
- API requests served (last 30 days)
- 4.1B
- Completion tokens generated (last 30 days)
- 99.8%
- Avg uptime (last 30 days)
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically route each request to the best model across providers based on latency, capabilities, or custom rules—without changing your code or client integration.
One endpoint, every model -
Cost-Aware Execution
Set cost ceilings and optimization policies so LLM.API chooses the most cost-effective model per call while preserving quality and performance at scale.
Lower spend, same quality -
Resilient Fallbacks
Configure provider and model failover in the platform, not your app. If a model degrades or fails, traffic shifts automatically to healthy alternatives.
No single point of failure -
Deep Observability
Get centralized logs, traces, and metrics for every provider and model: latencies, errors, token usage, and cost, with queryable insights for debugging and optimization.
See every token, everywhere -
Task-Level Abstractions
Define tasks like chat, summarize, extract, or classify once. LLM.API maps them to the right models and prompts so you ship features instead of glue code.
Think tasks, not models -
High-Throughput Batch
Batch thousands of inferences across providers through a single API call, with automatic chunking, retries, and result aggregation tuned for large-scale workloads.
Max throughput, minimal code
Decision guide
When to Use — When NOT to Use
Use it if...
- You need an agent to rapidly locate relevant files across very large codebases.
- You need multi-step, tool-using search that orchestrates parallel view_file and grep calls.
- Your use case involves powering an oracle coding agent with precise code search results.
- You need a code-focused search model with a 256K context and large outputs.
- Your use case involves building custom agent harnesses that parse structured search responses.
- You need OpenAI-compatible API access to specialized code search via platforms like OpenRouter.
Avoid if...
- You need a general-purpose chat or reasoning model beyond code search and retrieval.
- Your workload requires plug-and-play usage without building an agent harness around tools.
- You need the absolute cheapest model for simple keyword search over small repositories.
- Your workload requires vision, audio, or multimodal understanding rather than pure text code search.
- You need high-quality code generation or refactoring, not just finding relevant source files.
- Your workload requires ultra-low-latency single-call responses without parallel tool invocations.
FAQ
Frequently Asked Questions
-
What is Relace Search?
Relace Search is a search-optimized AI model by Relace designed to retrieve and rank relevant documents over large corpora.
-
What is Relace Search best suited for?
Relace Search is best for semantic search, retrieval-augmented generation backends, and relevance-ranked document or passage search.
-
What modalities does Relace Search support?
Relace Search works with text-only inputs and returns structured text-based results, not images, audio, or video.
-
How is Relace Search priced when used through LLM.API?
Relace Search pricing on LLM.API is usage-based per request or token, and you should check the LLM.API pricing page for current rates.
-
What is the context window of Relace Search?
Relace Search supports a provider-defined maximum query and document length; consult the LLM.API model card for the latest context window limits.
-
How fast is Relace Search in terms of latency?
Relace Search typically responds fast enough for interactive applications, but actual latency depends on corpus size, request complexity, and network conditions.
-
How do I call Relace Search via LLM.API?
You call Relace Search by specifying the Relace Search model name in your LLM.API request and passing your query and optional search parameters.
-
Can I use Relace Search as the retrieval layer for my RAG system?
Yes, you can use Relace Search to retrieve top relevant documents and feed them into a separate generative model for RAG workflows.
-
How does Relace Search compare to general-purpose LLMs for search tasks?
Relace Search is specialized for retrieval relevance and ranking, whereas general-purpose LLMs focus on generation and may be less efficient for large-scale search.
-
What are the main limitations of Relace Search?
Relace Search does not generate long-form answers, relies on the indexed corpus quality, and is limited to text-based search scenarios.
