Powered by Relace

Relace Search

  • Text Generation

Relace Search is a text-only large language model from Relace optimized for agentic multi-step search over large codebases, using parallel file-inspection tools to return highly relevant files quickly.

Start Using API

What is Relace Search?

Relace Search is a Relace large language model designed to explore codebases using 4–12 parallel `view_file` and `grep`-style tools and return the most relevant files to a query. It is mainly used as a subagent in autonomous coding systems to perform high-precision, multi-step search across large repositories, and to feed its findings into an orchestrating “oracle” coding agent. It is also useful for document-heavy workflows that benefit from its 256K-token context window and support for tool use and function calling. Relace Search belongs to Relace’s family of small, fast models built specifically as tools for coding agents and large-codebase retrieval.

5 Core Capabilities

  • Agentic Code Search

    Uses multiple parallel view_file and grep tools to explore large codebases and return precisely targeted relevant files.

  • Tool-Aware Orchestration

    Acts as a subagent that coordinates with external tools and hands structured findings to a downstream oracle coding agent.

  • Long-Context Retrieval

    Handles up to 256K-token contexts, enabling semantic search across extensive repositories, documentation, and multi-file projects.

  • Structured Function Calling

    Supports tool use and function calling, enabling programmatic integration into automated pipelines and custom developer workflows.

  • Text-Only Interface

    Provides text input and output completions via OpenAI-compatible chat endpoints, without native image or multimodal support.

6 Most Valuable Use Cases

  • Codebase File Retrieval
  • Agentic Code Search
  • Refactor Impact Scoping
  • Bug Context Gathering
  • Monorepo Navigation Assistant
  • Tool-Augmented Code Browsing

Cost Comparison

LLM API offers the lowest cost and latency for Relace Search–class models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 80ms 120 qps 99.99% $0.05 per 1M queries $0.00 Search over ~10M documents
Relace Global ~120ms ~60 qps ~99.9% ~$0.10 per 1M queries $0.00 ~5M documents per index
Pinecone (similar vector search) US East ~150ms ~40 qps 99.9% ~$0.20 per 1M queries $0.00 ~10M vectors per index
Weaviate Cloud (similar vector search) EU West ~160ms ~35 qps ~99.9% ~$0.18 per 1M queries $0.00 ~8M objects per cluster
Qdrant Cloud (similar vector search) Global ~170ms ~30 qps ~99.9% ~$0.16 per 1M queries $0.00 ~10M vectors per collection

Technical Specifications

Metric Relace Search Perplexity Search You.com Search
Avg Latency ~800ms ~900ms ~1.1s
Context Window ~32K ~32K ~16K
Input Price ($/1M tokens) ~$0.80 ~$1.00 ~$1.20
Output Price ($/1M tokens) ~$1.20 ~$1.50 ~$1.80
Max Output Tokens ~4K ~4K ~2K
Throughput ~40 rps ~35 rps ~30 rps
Uptime ~99.5% ~99.0% ~99.0%

30-day usage via LLM API

6.8B
Prompt tokens processed (last 30 days)
9.5M
API requests served (last 30 days)
4.1B
Completion tokens generated (last 30 days)
99.8%
Avg uptime (last 30 days)
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Automatically route each request to the best model across providers based on latency, capabilities, or custom rules—without changing your code or client integration.

    One endpoint, every model
  • Cost-Aware Execution

    Set cost ceilings and optimization policies so LLM.API chooses the most cost-effective model per call while preserving quality and performance at scale.

    Lower spend, same quality
  • Resilient Fallbacks

    Configure provider and model failover in the platform, not your app. If a model degrades or fails, traffic shifts automatically to healthy alternatives.

    No single point of failure
  • Deep Observability

    Get centralized logs, traces, and metrics for every provider and model: latencies, errors, token usage, and cost, with queryable insights for debugging and optimization.

    See every token, everywhere
  • Task-Level Abstractions

    Define tasks like chat, summarize, extract, or classify once. LLM.API maps them to the right models and prompts so you ship features instead of glue code.

    Think tasks, not models
  • High-Throughput Batch

    Batch thousands of inferences across providers through a single API call, with automatic chunking, retries, and result aggregation tuned for large-scale workloads.

    Max throughput, minimal code

When to Use — When NOT to Use

Use it if...

  • You need an agent to rapidly locate relevant files across very large codebases.
  • You need multi-step, tool-using search that orchestrates parallel view_file and grep calls.
  • Your use case involves powering an oracle coding agent with precise code search results.
  • You need a code-focused search model with a 256K context and large outputs.
  • Your use case involves building custom agent harnesses that parse structured search responses.
  • You need OpenAI-compatible API access to specialized code search via platforms like OpenRouter.

Avoid if...

  • You need a general-purpose chat or reasoning model beyond code search and retrieval.
  • Your workload requires plug-and-play usage without building an agent harness around tools.
  • You need the absolute cheapest model for simple keyword search over small repositories.
  • Your workload requires vision, audio, or multimodal understanding rather than pure text code search.
  • You need high-quality code generation or refactoring, not just finding relevant source files.
  • Your workload requires ultra-low-latency single-call responses without parallel tool invocations.

Frequently Asked Questions

  • What is Relace Search?

    Relace Search is a search-optimized AI model by Relace designed to retrieve and rank relevant documents over large corpora.

  • What is Relace Search best suited for?

    Relace Search is best for semantic search, retrieval-augmented generation backends, and relevance-ranked document or passage search.

  • What modalities does Relace Search support?

    Relace Search works with text-only inputs and returns structured text-based results, not images, audio, or video.

  • How is Relace Search priced when used through LLM.API?

    Relace Search pricing on LLM.API is usage-based per request or token, and you should check the LLM.API pricing page for current rates.

  • What is the context window of Relace Search?

    Relace Search supports a provider-defined maximum query and document length; consult the LLM.API model card for the latest context window limits.

  • How fast is Relace Search in terms of latency?

    Relace Search typically responds fast enough for interactive applications, but actual latency depends on corpus size, request complexity, and network conditions.

  • How do I call Relace Search via LLM.API?

    You call Relace Search by specifying the Relace Search model name in your LLM.API request and passing your query and optional search parameters.

  • Can I use Relace Search as the retrieval layer for my RAG system?

    Yes, you can use Relace Search to retrieve top relevant documents and feed them into a separate generative model for RAG workflows.

  • How does Relace Search compare to general-purpose LLMs for search tasks?

    Relace Search is specialized for retrieval relevance and ranking, whereas general-purpose LLMs focus on generation and may be less efficient for large-scale search.

  • What are the main limitations of Relace Search?

    Relace Search does not generate long-form answers, relies on the indexed corpus quality, and is limited to text-based search scenarios.

Start in 2 lines of code

Get My API Key