Powered by Perplexity
Sonar Pro Search
- Semantic Search
Sonar Pro Search is Perplexity’s most advanced agentic search model, adding multi-step reasoning and tool use on top of the Sonar Pro family. It is optimized for deep analysis, long-context retrieval, and comprehensive web-grounded answers.
About the model
What is Sonar Pro Search?
Sonar Pro Search is a Perplexity language model that extends Sonar Pro with autonomous, multi-step search and reasoning for complex information retrieval. It is mainly used for deep research workflows, where it plans and executes multiple web searches and tool calls to synthesize detailed, grounded responses. It is also used to power Pro Search modes in applications and APIs that need large-context (around 200K tokens) retrieval with structured, high-accuracy outputs. It belongs to Perplexity’s proprietary Sonar model family, alongside Sonar, Sonar Pro, Sonar Reasoning Pro, and Sonar Deep Research.
Model capabilities
5 Core Capabilities
-
Agentic Web Search
Executes multi-step, tool-using web searches, planning and refining queries to answer complex questions grounded in live online data.
-
Cited Research Answers
Generates synthesized research-style responses that include multiple supporting citations and sources for verification and further reading.
-
Long-Context Analysis
Handles very large inputs with an extended context window, enabling analysis of lengthy documents, conversations, and multi-part queries together.
-
Multilingual Question Support
Understands and responds to queries in multiple languages while still grounding answers in web search and external information sources.
-
Document-Like Content Extraction
Extracts and consolidates key facts, comparisons, and structured information from web pages, articles, and other text-heavy online content.
Use cases
6 Most Valuable Use Cases
- Complex Web Research
- Enterprise Knowledge Search
- Legal Case Fact-Finding
- Competitive Market Monitoring
- E-commerce Product Insights
- Developer Tool Documentation
Transparent pricing
Cost Comparison
LLM API delivers the lowest cost and latency for Sonar-class search models.
| Provider | Region | Latency | Throughput | Uptime | Input ($/1M) | Output ($/1M) | Context |
|---|---|---|---|---|---|---|---|
| LLM API BEST | Global | 120ms | 80 req/s | 99.99% | $0.20 | $0.20 | 128K |
| Perplexity | Global | ~220ms | ~35 req/s | ~99.9% | ~$0.60 | ~$0.60 | ~128K |
| OpenAI | US East | ~250ms | ~40 req/s | ~99.9% | ~$0.80 | ~$0.80 | ~128K |
| Anthropic | US West | ~180ms | ~60 qps | 99.9% | ~$0.50 | ~$1.50 | 200K |
| Google Cloud | EU West | ~190ms | ~70 qps | 99.9% | ~$0.40 | ~$1.20 | 128K |
Performance benchmarks
Technical Specifications
| Metric | Sonar Pro Search (Perplexity) | GPT-4.1 (OpenAI) | Claude 3.5 Sonnet (Anthropic) |
|---|---|---|---|
| Avg Latency | ~700ms | ~900ms | ~850ms |
| Context Window | ~200K | 128K | 200K |
| Input Price ($/1M tokens) | ~$3.00 | $5.00 | $3.00 |
| Output Price ($/1M tokens) | ~$8.00 | $15.00 | $15.00 |
| Max Output Tokens | ~4K | 4K | 4K |
| Throughput | ~60 tps | ~40 tps | ~45 tps |
| Uptime | ~99.9% | ~99.9% | ~99.9% |
30-day usage via LLM API
- 11.8B
- Prompt tokens processed (last 30 days)
- 3.6M
- API requests served (last 30 days)
- 9.4B
- Completion tokens generated (last 30 days)
- 99.8%
- Average API uptime
Architecture & Integration
Why Build on LLM.API?
One unified API. Every major model. Built-in reliability, cost control, and observability.
-
Unified AI Routing
Route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.
One endpoint, every model -
Smart Cost Controls
Define guardrails for spend and automatically select the most cost-effective models by task, so you can scale usage without surprise bills.
Optimize spend by design -
Automatic Fallbacks
Configure provider- and model-level failover once and let LLM.API retry or downgrade gracefully, keeping your workloads healthy when vendors break.
Resilience baked in -
Deep Observability
Get unified logs, traces, and metrics across all providers—latency, errors, tokens, and costs—so you can debug faster and tune prompts with real data.
See every token flow -
Task-Aware Workflows
Define reusable task abstractions—chat, tools, RAG, evals—then plug in any model behind them, standardizing behavior across teams and providers.
Tasks, not raw calls -
High-Throughput Batch
Ship massive workloads via one batch API that parallelizes requests, enforces rate limits, and tracks per-item results for analytics and retries.
Scale jobs, not ops
Decision guide
When to Use — When NOT to Use
Use it if...
- You need web-grounded answers with source citations for research, journalism, or fact-checking.
- You need multi-step web search workflows that autonomously plan, browse, and synthesize findings.
- You need high-factuality QA over current events, changing regulations, or fresh technical documentation.
- You need to answer complex questions that require pulling and reconciling many web sources.
- Your use case involves long-context search, aggregating and summarizing many pages or documents.
- Your use case involves building an AI research assistant that explains reasoning and cites sources.
- You need an API-accessible search-augmented model to embed into your own applications.
Avoid if...
- You need an offline model that runs fully air-gapped without any external web access.
- You need ultra-low-cost, high-volume inference where web search overhead is unnecessary or wasteful.
- Your workload requires strict data locality with no external HTTP calls for compliance reasons.
- You need frontier-level creative writing, coding, or reasoning independent of real-time search augmentation.
- Your workload requires millisecond-level latency responses where multiple web retrieval hops are unacceptable.
- You need fine-tuning or custom training of the base model weights for domain specialization.
- Your use case involves processing sensitive PII or trade secrets that cannot leave your environment.
FAQ
Frequently Asked Questions
-
What is Sonar Pro Search?
Sonar Pro Search is a Perplexity model optimized for search-augmented question answering and retrieval-heavy tasks via the LLM.API gateway.
-
What types of tasks is Sonar Pro Search best suited for?
Sonar Pro Search is best for complex web-assisted Q&A, research summarization, and retrieval-heavy workflows where up-to-date external information is important.
-
How is Sonar Pro Search priced when accessed through LLM.API?
Sonar Pro Search pricing is usage-based on tokens through LLM.API; check your LLM.API dashboard or pricing docs for current per-token rates.
-
What is the context window of Sonar Pro Search?
Sonar Pro Search’s exact context window depends on LLM.API’s configured version; refer to the model description in the LLM.API docs for limits.
-
What is the typical latency of Sonar Pro Search requests?
Latency varies by prompt length and external search time, but Sonar Pro Search generally responds within a few seconds for typical workloads.
-
Which modalities does Sonar Pro Search support through LLM.API?
Sonar Pro Search primarily supports text input and output via LLM.API, and does not natively handle image or audio content.
-
How do I call Sonar Pro Search using the LLM.API?
Use the standard LLM.API chat or completion endpoint and set the model field to the Sonar Pro Search identifier listed in the model catalog.
-
How does Sonar Pro Search compare to general-purpose chat models?
Compared to generic chat models, Sonar Pro Search is more effective for retrieval-augmented, search-based reasoning but less focused on open-ended creative generation.
-
Are there any notable limitations of Sonar Pro Search?
Sonar Pro Search depends on external search quality, may occasionally surface outdated or irrelevant results, and is not optimized for offline-only reasoning tasks.
-
Can I fine-tune or customize Sonar Pro Search via LLM.API?
Direct fine-tuning of Sonar Pro Search is not supported; instead, customize behavior through prompting, system messages, and retrieval or tool configurations.
