Powered by Perplexity

Sonar Pro Search

Sonar Pro Search is Perplexity’s most advanced agentic search model, adding multi-step reasoning and tool use on top of the Sonar Pro family. It is optimized for deep analysis, long-context retrieval, and comprehensive web-grounded answers.

Start Using API

What is Sonar Pro Search?

Sonar Pro Search is a Perplexity language model that extends Sonar Pro with autonomous, multi-step search and reasoning for complex information retrieval. It is mainly used for deep research workflows, where it plans and executes multiple web searches and tool calls to synthesize detailed, grounded responses. It is also used to power Pro Search modes in applications and APIs that need large-context (around 200K tokens) retrieval with structured, high-accuracy outputs. It belongs to Perplexity’s proprietary Sonar model family, alongside Sonar, Sonar Pro, Sonar Reasoning Pro, and Sonar Deep Research.

5 Core Capabilities

  • Agentic Web Search

    Executes multi-step, tool-using web searches, planning and refining queries to answer complex questions grounded in live online data.

  • Cited Research Answers

    Generates synthesized research-style responses that include multiple supporting citations and sources for verification and further reading.

  • Long-Context Analysis

    Handles very large inputs with an extended context window, enabling analysis of lengthy documents, conversations, and multi-part queries together.

  • Multilingual Question Support

    Understands and responds to queries in multiple languages while still grounding answers in web search and external information sources.

  • Document-Like Content Extraction

    Extracts and consolidates key facts, comparisons, and structured information from web pages, articles, and other text-heavy online content.

6 Most Valuable Use Cases

  • Complex Web Research
  • Enterprise Knowledge Search
  • Legal Case Fact-Finding
  • Competitive Market Monitoring
  • E-commerce Product Insights
  • Developer Tool Documentation

Cost Comparison

LLM API delivers the lowest cost and latency for Sonar-class search models.

Provider Region Latency Throughput Uptime Input ($/1M) Output ($/1M) Context
LLM API BEST Global 120ms 80 req/s 99.99% $0.20 $0.20 128K
Perplexity Global ~220ms ~35 req/s ~99.9% ~$0.60 ~$0.60 ~128K
OpenAI US East ~250ms ~40 req/s ~99.9% ~$0.80 ~$0.80 ~128K
Anthropic US West ~180ms ~60 qps 99.9% ~$0.50 ~$1.50 200K
Google Cloud EU West ~190ms ~70 qps 99.9% ~$0.40 ~$1.20 128K

Technical Specifications

Metric Sonar Pro Search (Perplexity) GPT-4.1 (OpenAI) Claude 3.5 Sonnet (Anthropic)
Avg Latency ~700ms ~900ms ~850ms
Context Window ~200K 128K 200K
Input Price ($/1M tokens) ~$3.00 $5.00 $3.00
Output Price ($/1M tokens) ~$8.00 $15.00 $15.00
Max Output Tokens ~4K 4K 4K
Throughput ~60 tps ~40 tps ~45 tps
Uptime ~99.9% ~99.9% ~99.9%

30-day usage via LLM API

11.8B
Prompt tokens processed (last 30 days)
3.6M
API requests served (last 30 days)
9.4B
Completion tokens generated (last 30 days)
99.8%
Average API uptime
Start Using API

Why Build on LLM.API?

One unified API. Every major model. Built-in reliability, cost control, and observability.

  • Unified AI Routing

    Route each request to the optimal model across providers based on cost, latency, and quality—without changing your integration or redeploying code.

    One endpoint, every model
  • Smart Cost Controls

    Define guardrails for spend and automatically select the most cost-effective models by task, so you can scale usage without surprise bills.

    Optimize spend by design
  • Automatic Fallbacks

    Configure provider- and model-level failover once and let LLM.API retry or downgrade gracefully, keeping your workloads healthy when vendors break.

    Resilience baked in
  • Deep Observability

    Get unified logs, traces, and metrics across all providers—latency, errors, tokens, and costs—so you can debug faster and tune prompts with real data.

    See every token flow
  • Task-Aware Workflows

    Define reusable task abstractions—chat, tools, RAG, evals—then plug in any model behind them, standardizing behavior across teams and providers.

    Tasks, not raw calls
  • High-Throughput Batch

    Ship massive workloads via one batch API that parallelizes requests, enforces rate limits, and tracks per-item results for analytics and retries.

    Scale jobs, not ops

When to Use — When NOT to Use

Use it if...

  • You need web-grounded answers with source citations for research, journalism, or fact-checking.
  • You need multi-step web search workflows that autonomously plan, browse, and synthesize findings.
  • You need high-factuality QA over current events, changing regulations, or fresh technical documentation.
  • You need to answer complex questions that require pulling and reconciling many web sources.
  • Your use case involves long-context search, aggregating and summarizing many pages or documents.
  • Your use case involves building an AI research assistant that explains reasoning and cites sources.
  • You need an API-accessible search-augmented model to embed into your own applications.

Avoid if...

  • You need an offline model that runs fully air-gapped without any external web access.
  • You need ultra-low-cost, high-volume inference where web search overhead is unnecessary or wasteful.
  • Your workload requires strict data locality with no external HTTP calls for compliance reasons.
  • You need frontier-level creative writing, coding, or reasoning independent of real-time search augmentation.
  • Your workload requires millisecond-level latency responses where multiple web retrieval hops are unacceptable.
  • You need fine-tuning or custom training of the base model weights for domain specialization.
  • Your use case involves processing sensitive PII or trade secrets that cannot leave your environment.

Frequently Asked Questions

  • What is Sonar Pro Search?

    Sonar Pro Search is a Perplexity model optimized for search-augmented question answering and retrieval-heavy tasks via the LLM.API gateway.

  • What types of tasks is Sonar Pro Search best suited for?

    Sonar Pro Search is best for complex web-assisted Q&A, research summarization, and retrieval-heavy workflows where up-to-date external information is important.

  • How is Sonar Pro Search priced when accessed through LLM.API?

    Sonar Pro Search pricing is usage-based on tokens through LLM.API; check your LLM.API dashboard or pricing docs for current per-token rates.

  • What is the context window of Sonar Pro Search?

    Sonar Pro Search’s exact context window depends on LLM.API’s configured version; refer to the model description in the LLM.API docs for limits.

  • What is the typical latency of Sonar Pro Search requests?

    Latency varies by prompt length and external search time, but Sonar Pro Search generally responds within a few seconds for typical workloads.

  • Which modalities does Sonar Pro Search support through LLM.API?

    Sonar Pro Search primarily supports text input and output via LLM.API, and does not natively handle image or audio content.

  • How do I call Sonar Pro Search using the LLM.API?

    Use the standard LLM.API chat or completion endpoint and set the model field to the Sonar Pro Search identifier listed in the model catalog.

  • How does Sonar Pro Search compare to general-purpose chat models?

    Compared to generic chat models, Sonar Pro Search is more effective for retrieval-augmented, search-based reasoning but less focused on open-ended creative generation.

  • Are there any notable limitations of Sonar Pro Search?

    Sonar Pro Search depends on external search quality, may occasionally surface outdated or irrelevant results, and is not optimized for offline-only reasoning tasks.

  • Can I fine-tune or customize Sonar Pro Search via LLM.API?

    Direct fine-tuning of Sonar Pro Search is not supported; instead, customize behavior through prompting, system messages, and retrieval or tool configurations.

Start in 2 lines of code

Get My API Key