Powered by Cohere
Rerank v3.5
- Text Generation
Rerank v3.5 by Cohere is a commercial reranking model that scores and reorders candidate documents or passages based on their relevance to a given query. It is optimized for retrieval-augmented applications that need high-quality result ranking over relatively small candidate sets.
About the model
What is Rerank v3.5?
Rerank v3.5 is a Cohere model that takes a user query and a list of candidate texts and outputs relevance scores to produce a better-ordered ranking. It is mainly used to rerank search or retrieval results in RAG systems, chatbots, and question-answering pipelines so that the most relevant documents are surfaced first. It is also used in recommendation-like scenarios and domain-specific search where precise ordering of a short candidate list matters more than large-scale embedding retrieval. It follows earlier Cohere rerank models (such as Rerank v2 and v3) as part of Cohere’s family of specialized retrieval and reranking models.
Model capabilities
5 Core Capabilities
-
Document Reranking
Scores and reorders candidate documents for a query, improving relevance in search, recommendation, and retrieval-augmented generation pipelines.
-
Multilingual Support
Handles queries and documents across 100+ languages with a single multilingual reranking model, enabling global, cross-language search experiences.
-
Semi-Structured Data
Reranks semi-structured inputs such as JSON records or metadata-enriched documents, not just plain text passages or pages.
-
RAG Optimization
Improves retrieval-augmented generation by reranking keyword and vector search results so downstream generators see the most relevant context.
-
Enterprise Search
Enhances internal enterprise search over large specialized corpora, boosting result precision for domains like finance, healthcare, and government.
Use cases
6 Most Valuable Use Cases
- Search Result Reranking
- E-commerce Product Ranking
- Legal Document Retrieval
- Customer Ticket Prioritization
- Technical Answer Selection
- Invoice Query Matching
Transparent pricing
Cost Comparison
LLM API offers the lowest rerank pricing and fastest typical latency compared to Cohere and other clouds.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 100ms | 120 qps | 99.99% | $0.03 | $0.00 | 128K |
| Cohere | Global | ~180ms | ~60 qps | 99.9% | ~$0.10 | $0.00 | 128K |
| Azure AI | US East | ~150ms | ~80 qps | 99.9% | ~$0.11 | $0.00 | ~128K |
| Amazon Bedrock | US West | ~170ms | ~70 qps | 99.9% | ~$0.12 | $0.00 | ~128K |
Performance benchmarks
Technical Specifications
| Metric | Rerank v3.5 (Cohere) | Cohere Rerank v3 | OpenAI text-embedding-3-large (as reranker) |
|---|---|---|---|
| Task Type | Reranking | Reranking | Embedding-based Reranking |
| Avg Latency | ~120ms | ~150ms | ~200ms |
| Max Input Tokens | 8K | 4K | 8K |
| Max Items per Request | ~256 docs | ~128 docs | ~200 docs |
| Price per 1K Items | ~$0.10 | ~$0.08 | ~$0.12 |
| Throughput | ~120 req/s | ~100 req/s | ~90 req/s |
| Primary Use Cases | Search & QA rerank | Search & QA rerank | Vector search rerank |
30-day usage via LLM API
- 1.6B
- Documents reranked
- 24M
- API requests
- 11.3K
- Active developers
- 99.9%
- Avg uptime
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Automatically select the best model per request based on latency, cost, and capability. One endpoint abstracts every provider, so you ship faster and swap safely.
One endpoint, any model -
Cost-Aware Control
Enforce per-project and per-model budgets, caps, and guardrails. Dynamically steer traffic to cheaper equivalents without touching your application code.
Lower spend, same output -
Resilient Fallbacks
Define fallback chains across providers so requests survive outages, rate limits, and model errors. Your app stays online, even when vendors don't.
Failover built in -
End-to-End Observability
Get centralized logs, traces, and metrics for every provider and model. Inspect prompts, latencies, costs, and failures from a single, queryable pane.
See every token -
Task-Level Orchestration
Define complex workflows—retrieval, tools, multi-step agents—using a consistent task API. Swap underlying models or providers without rewriting business logic.
Abstract the workflow -
High-Throughput Batch
Send massive workloads as batches with automatic chunking, retries, and parallelization across providers. Optimize throughput and cost without managing infrastructure.
Scale jobs, not ops
Decision guide
When to Use — When NOT to Use
Use it if...
- You need to rerank search results from a vector database for better relevance.
- You need to improve semantic search quality over short passages or document chunks.
- Your use case involves reordering candidate answers from a QA system by usefulness.
- Your use case involves ranking product listings based on textual query intent.
- You need to prioritize the most relevant support tickets or knowledge base articles.
- Your use case involves boosting click-through by reranking links using query-document similarity.
- You need compact relevance scores to combine with traditional keyword or BM25 rankings.
Avoid if...
- You need a generative model to write or summarize content from scratch.
- Your workload requires understanding or generating images, audio, or other non-text modalities.
- You need deep multi-step reasoning, planning, or tool-calling beyond relevance scoring.
- Your workload requires operating directly on raw, extremely long documents without chunking.
- You need a model to create embeddings rather than score existing query-document pairs.
- Your workload requires low-level token probabilities or language modeling instead of rankings.
- You need end-to-end conversational AI rather than a reranking component in a pipeline.
FAQ
Frequently Asked Questions
-
What is Rerank v3.5?
Rerank v3.5 is a Cohere model that scores and reorders candidate documents or passages based on their relevance to a query.
-
What is Rerank v3.5 best used for?
It is best for improving search and retrieval quality in RAG pipelines, semantic search, recommendation ranking, and other reranking-heavy workflows.
-
How much does using Cohere Rerank v3.5 via LLM.API cost?
Pricing is request-based and set by LLM.API for this Cohere-backed reranker; check the LLM.API pricing page for current per-request rates.
-
What context window does Rerank v3.5 support?
Rerank v3.5 accepts a query plus a list of candidate texts, with each candidate typically limited to a few thousand characters for best results.
-
How fast is Rerank v3.5 in terms of latency?
Rerank v3.5 is optimized for low-latency scoring of many short candidates, usually returning results in well under a second for typical batch sizes.
-
What input and output modalities does Rerank v3.5 support?
Rerank v3.5 is a text-only model that takes a text query and text candidates as input and outputs numeric relevance scores and ranking.
-
How do I call Rerank v3.5 through the LLM.API gateway?
You select the Cohere Rerank v3.5 model identifier in your LLM.API request and pass a query plus an array of candidate documents to rerank.
-
How does Rerank v3.5 compare to using an embedding-based search alone?
Compared to pure embedding similarity, Rerank v3.5 usually provides more precise top results by contextually reranking a shortlist of retrieved candidates.
-
What are the main limitations of Rerank v3.5?
Rerank v3.5 cannot generate text, handle images, or replace retrieval; it only scores and reorders provided candidates and may degrade on very long texts.
-
Can I use Rerank v3.5 for multilingual queries and documents?
Rerank v3.5 supports multiple languages, but performance may be strongest on English and other well-represented languages in its training data.
