Rerank v3.5

Text Generation

Rerank v3.5 by Cohere is a commercial reranking model that scores and reorders candidate documents or passages based on their relevance to a given query. It is optimized for retrieval-augmented applications that need high-quality result ranking over relatively small candidate sets.

Start Using API

API Performance

Latency: ~0.4s avg response
Context: ~8K token context
Input: ~$0.25 per 1M tokens (documents)
Output: ~$0.25 per 1M tokens (query + documents)
Uptime: 99% 99%

About the model

What is Rerank v3.5?

Rerank v3.5 is a Cohere model that takes a user query and a list of candidate texts and outputs relevance scores to produce a better-ordered ranking. It is mainly used to rerank search or retrieval results in RAG systems, chatbots, and question-answering pipelines so that the most relevant documents are surfaced first. It is also used in recommendation-like scenarios and domain-specific search where precise ordering of a short candidate list matters more than large-scale embedding retrieval. It follows earlier Cohere rerank models (such as Rerank v2 and v3) as part of Cohere’s family of specialized retrieval and reranking models.

Input / Output

Input

Text query string
List of text documents or semi-structured JSON documents

Output

Ranked list of documents with relevance scores

Model capabilities

5 Core Capabilities

Document Reranking

Scores and reorders candidate documents for a query, improving relevance in search, recommendation, and retrieval-augmented generation pipelines.
Multilingual Support

Handles queries and documents across 100+ languages with a single multilingual reranking model, enabling global, cross-language search experiences.
Semi-Structured Data

Reranks semi-structured inputs such as JSON records or metadata-enriched documents, not just plain text passages or pages.
RAG Optimization

Improves retrieval-augmented generation by reranking keyword and vector search results so downstream generators see the most relevant context.
Enterprise Search

Enhances internal enterprise search over large specialized corpora, boosting result precision for domains like finance, healthcare, and government.

Use cases

6 Most Valuable Use Cases

Search Result Reranking
E-commerce Product Ranking
Legal Document Retrieval
Customer Ticket Prioritization
Technical Answer Selection
Invoice Query Matching

Transparent pricing

Cost Comparison

LLM API offers the lowest rerank pricing and fastest typical latency compared to Cohere and other clouds.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	100ms	120 qps	99.99%	$0.03	$0.00	128K
Cohere	Global	~180ms	~60 qps	99.9%	~$0.10	$0.00	128K
Azure AI	US East	~150ms	~80 qps	99.9%	~$0.11	$0.00	~128K
Amazon Bedrock	US West	~170ms	~70 qps	99.9%	~$0.12	$0.00	~128K

Performance benchmarks

Technical Specifications

Metric	Rerank v3.5 (Cohere)	Cohere Rerank v3	OpenAI text-embedding-3-large (as reranker)
Task Type	Reranking	Reranking	Embedding-based Reranking
Avg Latency	~120ms	~150ms	~200ms
Max Input Tokens	8K	4K	8K
Max Items per Request	~256 docs	~128 docs	~200 docs
Price per 1K Items	~$0.10	~$0.08	~$0.12
Throughput	~120 req/s	~100 req/s	~90 req/s
Primary Use Cases	Search & QA rerank	Search & QA rerank	Vector search rerank

30-day usage via LLM API

1.6B: Documents reranked
24M: API requests
11.3K: Active developers
99.9%: Avg uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Automatically select the best model per request based on latency, cost, and capability. One endpoint abstracts every provider, so you ship faster and swap safely.
One endpoint, any model
Cost-Aware Control

Enforce per-project and per-model budgets, caps, and guardrails. Dynamically steer traffic to cheaper equivalents without touching your application code.
Lower spend, same output
Resilient Fallbacks

Define fallback chains across providers so requests survive outages, rate limits, and model errors. Your app stays online, even when vendors don't.
Failover built in
End-to-End Observability

Get centralized logs, traces, and metrics for every provider and model. Inspect prompts, latencies, costs, and failures from a single, queryable pane.
See every token
Task-Level Orchestration

Define complex workflows—retrieval, tools, multi-step agents—using a consistent task API. Swap underlying models or providers without rewriting business logic.
Abstract the workflow
High-Throughput Batch

Send massive workloads as batches with automatic chunking, retries, and parallelization across providers. Optimize throughput and cost without managing infrastructure.
Scale jobs, not ops

Decision guide

When to Use — When NOT to Use

Use it if...

You need to rerank search results from a vector database for better relevance.
You need to improve semantic search quality over short passages or document chunks.
Your use case involves reordering candidate answers from a QA system by usefulness.
Your use case involves ranking product listings based on textual query intent.
You need to prioritize the most relevant support tickets or knowledge base articles.
Your use case involves boosting click-through by reranking links using query-document similarity.
You need compact relevance scores to combine with traditional keyword or BM25 rankings.

Avoid if...

You need a generative model to write or summarize content from scratch.
Your workload requires understanding or generating images, audio, or other non-text modalities.
You need deep multi-step reasoning, planning, or tool-calling beyond relevance scoring.
Your workload requires operating directly on raw, extremely long documents without chunking.
You need a model to create embeddings rather than score existing query-document pairs.
Your workload requires low-level token probabilities or language modeling instead of rankings.
You need end-to-end conversational AI rather than a reranking component in a pipeline.

FAQ

Frequently Asked Questions

What is Rerank v3.5?

Rerank v3.5 is a Cohere model that scores and reorders candidate documents or passages based on their relevance to a query.
What is Rerank v3.5 best used for?

It is best for improving search and retrieval quality in RAG pipelines, semantic search, recommendation ranking, and other reranking-heavy workflows.
How much does using Cohere Rerank v3.5 via LLM.API cost?

Pricing is request-based and set by LLM.API for this Cohere-backed reranker; check the LLM.API pricing page for current per-request rates.
What context window does Rerank v3.5 support?

Rerank v3.5 accepts a query plus a list of candidate texts, with each candidate typically limited to a few thousand characters for best results.
How fast is Rerank v3.5 in terms of latency?

Rerank v3.5 is optimized for low-latency scoring of many short candidates, usually returning results in well under a second for typical batch sizes.
What input and output modalities does Rerank v3.5 support?

Rerank v3.5 is a text-only model that takes a text query and text candidates as input and outputs numeric relevance scores and ranking.
How do I call Rerank v3.5 through the LLM.API gateway?

You select the Cohere Rerank v3.5 model identifier in your LLM.API request and pass a query plus an array of candidate documents to rerank.
How does Rerank v3.5 compare to using an embedding-based search alone?

Compared to pure embedding similarity, Rerank v3.5 usually provides more precise top results by contextually reranking a shortlist of retrieved candidates.
What are the main limitations of Rerank v3.5?

Rerank v3.5 cannot generate text, handle images, or replace retrieval; it only scores and reorders provided candidates and may degrade on very long texts.
Can I use Rerank v3.5 for multilingual queries and documents?

Rerank v3.5 supports multiple languages, but performance may be strongest on English and other well-represented languages in its training data.

Start in 2 lines of code

Get My API Key

Rerank v3.5

What is Rerank v3.5?

5 Core Capabilities

Document Reranking

Multilingual Support

Semi-Structured Data

RAG Optimization

Enterprise Search

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Cost-Aware Control

Resilient Fallbacks

End-to-End Observability

Task-Level Orchestration

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code