Sonar Pro Search

Semantic Search

Sonar Pro Search is Perplexity’s most advanced agentic search model, adding multi-step reasoning and tool use on top of the Sonar Pro family. It is optimized for deep analysis, long-context retrieval, and comprehensive web-grounded answers.

Start Using API

API Performance

Latency: ~1.0s avg response
Context: 200K token context
Input: $3.00 per 1M tokens
Output: $15.00 per 1M tokens
Uptime: 99% 99%

About the model

What is Sonar Pro Search?

Sonar Pro Search is a Perplexity language model that extends Sonar Pro with autonomous, multi-step search and reasoning for complex information retrieval. It is mainly used for deep research workflows, where it plans and executes multiple web searches and tool calls to synthesize detailed, grounded responses. It is also used to power Pro Search modes in applications and APIs that need large-context (around 200K tokens) retrieval with structured, high-accuracy outputs. It belongs to Perplexity’s proprietary Sonar model family, alongside Sonar, Sonar Pro, Sonar Reasoning Pro, and Sonar Deep Research.

Input / Output

Input

Text prompts (natural language, up to 200K-token context)
Images (vision input for analysis with search grounding)

Output

Search-grounded natural language responses

Model capabilities

5 Core Capabilities

Agentic Web Search

Executes multi-step, tool-using web searches, planning and refining queries to answer complex questions grounded in live online data.
Cited Research Answers

Generates synthesized research-style responses that include multiple supporting citations and sources for verification and further reading.
Long-Context Analysis

Handles very large inputs with an extended context window, enabling analysis of lengthy documents, conversations, and multi-part queries together.
Multilingual Question Support

Understands and responds to queries in multiple languages while still grounding answers in web search and external information sources.
Document-Like Content Extraction

Extracts and consolidates key facts, comparisons, and structured information from web pages, articles, and other text-heavy online content.

Use cases

6 Most Valuable Use Cases

Complex Web Research
Enterprise Knowledge Search
Legal Case Fact-Finding
Competitive Market Monitoring
E-commerce Product Insights
Developer Tool Documentation

Transparent pricing

Cost Comparison

LLM API delivers the lowest cost and latency for Sonar-class search models.

Provider	Region	Latency	Throughput	Uptime	Input ($/1M)	Output ($/1M)	Context
LLM API BEST	Global	120ms	80 req/s	99.99%	$0.20	$0.20	128K
Perplexity	Global	~220ms	~35 req/s	~99.9%	~$0.60	~$0.60	~128K
OpenAI	US East	~250ms	~40 req/s	~99.9%	~$0.80	~$0.80	~128K
Anthropic	US West	~180ms	~60 qps	99.9%	~$0.50	~$1.50	200K
Google Cloud	EU West	~190ms	~70 qps	99.9%	~$0.40	~$1.20	128K

Performance benchmarks

Technical Specifications

Metric	Sonar Pro Search (Perplexity)	GPT-4.1 (OpenAI)	Claude 3.5 Sonnet (Anthropic)
Avg Latency	~700ms	~900ms	~850ms
Context Window	~200K	128K	200K
Input Price ($/1M tokens)	~$3.00	$5.00	$3.00
Output Price ($/1M tokens)	~$8.00	$15.00	$15.00
Max Output Tokens	~4K	4K	4K
Throughput	~60 tps	~40 tps	~45 tps
Uptime	~99.9%	~99.9%	~99.9%

30-day usage via LLM API

11.8B: Prompt tokens processed (last 30 days)
3.6M: API requests served (last 30 days)
9.4B: Completion tokens generated (last 30 days)
99.8%: Average API uptime

Start Using API

Architecture & Integration

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

Unified AI Routing

Route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint, every model
Smart Cost Controls

Define guardrails for spend and automatically select the most cost-effective models by task, so you can scale usage without surprise bills.
Optimize spend by design
Automatic Fallbacks

Configure provider- and model-level failover once and let LLM.API retry or downgrade gracefully, keeping your workloads healthy when vendors break.
Resilience baked in
Deep Observability

Get unified logs, traces, and metrics across all providers—latency, errors, tokens, and costs—so you can debug faster and tune prompts with real data.
See every token flow
Task-Aware Workflows

Define reusable task abstractions—chat, tools, RAG, evals—then plug in any model behind them, standardizing behavior across teams and providers.
Tasks, not raw calls
High-Throughput Batch

Ship massive workloads via one batch API that parallelizes requests, enforces rate limits, and tracks per-item results for analytics and retries.
Scale jobs, not ops

Decision guide

When to Use — When NOT to Use

Use it if...

You need web-grounded answers with source citations for research, journalism, or fact-checking.
You need multi-step web search workflows that autonomously plan, browse, and synthesize findings.
You need high-factuality QA over current events, changing regulations, or fresh technical documentation.
You need to answer complex questions that require pulling and reconciling many web sources.
Your use case involves long-context search, aggregating and summarizing many pages or documents.
Your use case involves building an AI research assistant that explains reasoning and cites sources.
You need an API-accessible search-augmented model to embed into your own applications.

Avoid if...

You need an offline model that runs fully air-gapped without any external web access.
You need ultra-low-cost, high-volume inference where web search overhead is unnecessary or wasteful.
Your workload requires strict data locality with no external HTTP calls for compliance reasons.
You need frontier-level creative writing, coding, or reasoning independent of real-time search augmentation.
Your workload requires millisecond-level latency responses where multiple web retrieval hops are unacceptable.
You need fine-tuning or custom training of the base model weights for domain specialization.
Your use case involves processing sensitive PII or trade secrets that cannot leave your environment.

FAQ

Frequently Asked Questions

What is Sonar Pro Search?

Sonar Pro Search is a Perplexity model optimized for search-augmented question answering and retrieval-heavy tasks via the LLM.API gateway.
What types of tasks is Sonar Pro Search best suited for?

Sonar Pro Search is best for complex web-assisted Q&A, research summarization, and retrieval-heavy workflows where up-to-date external information is important.
How is Sonar Pro Search priced when accessed through LLM.API?

Sonar Pro Search pricing is usage-based on tokens through LLM.API; check your LLM.API dashboard or pricing docs for current per-token rates.
What is the context window of Sonar Pro Search?

Sonar Pro Search’s exact context window depends on LLM.API’s configured version; refer to the model description in the LLM.API docs for limits.
What is the typical latency of Sonar Pro Search requests?

Latency varies by prompt length and external search time, but Sonar Pro Search generally responds within a few seconds for typical workloads.
Which modalities does Sonar Pro Search support through LLM.API?

Sonar Pro Search primarily supports text input and output via LLM.API, and does not natively handle image or audio content.
How do I call Sonar Pro Search using the LLM.API?

Use the standard LLM.API chat or completion endpoint and set the model field to the Sonar Pro Search identifier listed in the model catalog.
How does Sonar Pro Search compare to general-purpose chat models?

Compared to generic chat models, Sonar Pro Search is more effective for retrieval-augmented, search-based reasoning but less focused on open-ended creative generation.
Are there any notable limitations of Sonar Pro Search?

Sonar Pro Search depends on external search quality, may occasionally surface outdated or irrelevant results, and is not optimized for offline-only reasoning tasks.
Can I fine-tune or customize Sonar Pro Search via LLM.API?

Direct fine-tuning of Sonar Pro Search is not supported; instead, customize behavior through prompting, system messages, and retrieval or tool configurations.

Start in 2 lines of code

Get My API Key

Sonar Pro Search

What is Sonar Pro Search?

5 Core Capabilities

Agentic Web Search

Cited Research Answers

Long-Context Analysis

Multilingual Question Support

Document-Like Content Extraction

6 Most Valuable Use Cases

Cost Comparison

Technical Specifications

Why Build on LLM.API?

Unified AI Routing

Smart Cost Controls

Automatic Fallbacks

Deep Observability

Task-Aware Workflows

High-Throughput Batch

When to Use — When NOT to Use

Use it if...

Avoid if...

Start in 2 lines of code