
Perplexity Sonar/Sonar Pro APIs: The Search API for Builders

Table of Contents
Perplexity Sonar/Sonar Pro APIs: The Search API for Builders #
Yesterday, Perplexity officially opened the floodgates. After months of anticipation and a gradual rollout that had developers clamoring for access, the Sonar Pro API is now generally available to all developers. I've been tracking this launch closely because it represents something we haven't really had before: a purpose-built search API that combines real-time web indexing with citation-backed answers at prices that don't require enterprise procurement teams.
This isn't just another LLM wrapper. The Sonar API family—comprising the base Sonar model and the more capable Sonar Pro—is fundamentally different from the standard OpenAI or Anthropic APIs you're used to. Every query automatically triggers live web searches. Every response includes inline citations to sources. And with the January 2025 GA release, you now get structured outputs, search domain filtering, and a context window that rivals the best foundation models.
Let me break down what this means for builders, why I think it's a genuine milestone for AI-powered applications, and exactly how to integrate it into your stack.
What Is the Perplexity Sonar API? #
The Sonar API is Perplexity's developer platform for integrating real-time, citation-backed search into applications. Unlike standard LLM APIs that rely on static training data, every Sonar query performs live web searches and synthesizes results with source attribution.
Launched initially in beta through 2024 and reaching general availability on January 21, 2025, the Sonar API represents Perplexity's bet that the future of AI isn't just bigger models—it's better access to fresh information. The API exposes two main models:
| Model | Context Window | Best For | Citations |
|---|---|---|---|
| Sonar | 127K tokens | Speed-first Q&A, cost-sensitive applications | Standard |
| Sonar Pro | 200K tokens | Complex multi-step queries, research depth | 2x more than Sonar |
What makes this compelling isn't just the search capability—it's the economics. At a time when GPT-4 Turbo runs $10 per million output tokens, Sonar Pro delivers web-grounded answers at a fraction of that cost with source citations baked in.
Sonar vs. Sonar Pro: Which Should You Choose? #
Choose Sonar for speed-optimized, straightforward queries where cost efficiency matters. Choose Sonar Pro when you need deeper research, longer context, or 2x the citations. The decision maps directly to your use case complexity.
Here's how I think about the tradeoffs:
Sonar (Base)
- 127,000 token context window—sufficient for most single-turn queries
- Optimized for latency—returns answers faster than Pro
- $5 per 1,000 requests (low context) with $1/$1 per million input/output tokens
- Best for: FAQ bots, quick lookups, customer support triage, real-time notifications
Sonar Pro
- 200,000 token context window—handles lengthy documents and multi-turn conversations
- Double the citations on average compared to standard Sonar
- Advanced information retrieval architecture for complex, multi-step Q&A
- $6 per 1,000 requests (low context) with $3/$15 per million input/output tokens
- Best for: Research assistants, due diligence tools, medical/legal research, competitive analysis
The pricing gap is narrow enough that I default to Sonar Pro for most production use cases. The extra citations alone justify the upgrade when accuracy matters—and in high-stakes domains like healthcare (Doximity's use case) or sales intelligence (Copy.ai's implementation), those citations aren't optional, they're essential.
Pricing Breakdown: What Each Query Actually Costs #
Sonar API pricing combines per-request fees (based on search context size) with token-based charges. A typical Sonar Pro query costs between $0.0057 and $0.0147 depending on your context settings and token usage.
Understanding the pricing model is critical because it's different from pure token-based APIs. You pay for two components:
1. Token Costs (Per Million Tokens) #
| Model | Input Tokens | Output Tokens |
|---|---|---|
| Sonar | $1 | $1 |
| Sonar Pro | $3 | $15 |
| Sonar Reasoning Pro | $2 | $8 |
2. Request Fees (Per 1,000 Requests by Context Size) #
| Model | Low Context | Medium Context | High Context |
|---|---|---|---|
| Sonar | $5 | $8 | $12 |
| Sonar Pro | $6 | $10 | $14 |
Context size determines how much web information gets retrieved—not the model's context window, but the breadth of search results fed into the synthesis process.
Real Cost Examples #
Simple Sonar Query (500 input + 200 output tokens, Low Context):
- Input tokens: $0.0005
- Output tokens: $0.0002
- Request fee: $0.005
- Total: $0.0057 per query
Complex Sonar Pro Query with Medium Context:
- Input tokens (1,000): $0.003
- Output tokens (2,000): $0.03
- Request fee: $0.010
- Total: ~$0.043 per query
At scale—say 100,000 queries per month—you're looking at roughly $570 to $4,300 depending on model choice and complexity. That's competitive with enterprise search APIs and substantially cheaper than running your own RAG pipeline with OpenAI embeddings + GPT-4 + web scrapers.
Code Example: Your First Sonar Pro API Call #
Getting started requires a single API key and any HTTP client. The endpoint is OpenAI-compatible, so existing codebases need minimal changes.
Python with Native SDK #
from perplexity import Perplexity
client = Perplexity()
completion = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "What are the latest developments in quantum computing?"
}
]
)
print(completion.choices[0].message.content)Python with OpenAI SDK (Drop-in Replacement) #
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("PERPLEXITY_API_KEY"),
base_url="https://api.perplexity.ai"
)
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "What are the latest developments in quantum computing?"
}
]
)
print(response.choices[0].message.content)cURL #
curl https://api.perplexity.ai/v1/sonar \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar-pro",
"messages": [
{
"role": "user",
"content": "What are the latest developments in quantum computing?"
}
]
}'TypeScript/JavaScript #
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.PERPLEXITY_API_KEY,
baseURL: "https://api.perplexity.ai"
});
const response = await client.chat.completions.create({
model: "sonar-pro",
messages: [
{
role: "user",
content: "What are the latest developments in quantum computing?"
}
]
});
console.log(response.choices[0].message.content);The OpenAI compatibility is a strategic choice that lowers switching costs. If you're already using openai SDK in production, you can trial Perplexity by changing two lines: the baseURL and the model name.
Handling Streaming Responses for Real-Time UX #
Enable streaming by adding stream: true to receive tokens as they're generated. This is essential for production applications where perceived latency directly impacts user satisfaction.
One of Perplexity's strengths is fast time-to-first-token. For applications where users watch answers materialize in real time—think ChatGPT-style interfaces—streaming isn't optional.
Python Streaming Example #
from perplexity import Perplexity
client = Perplexity()
stream = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "What are the most popular open-source alternatives to OpenAI's GPT models?"
}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")JavaScript/TypeScript Streaming #
import Perplexity from '@perplexity-ai/perplexity_ai';
const client = new Perplexity();
const stream = await client.chat.completions.create({
model: "sonar-pro",
messages: [
{
role: "user",
content: "What are the most popular open-source alternatives to OpenAI's GPT models?"
}
],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0].delta.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}cURL with Streaming #
curl https://api.perplexity.ai/v1/sonar \
-H "Authorization: Bearer $PERPLEXITY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "sonar-pro",
"messages": [
{
"role": "user",
"content": "What are the most popular open-source alternatives to OpenAI'\''s GPT models?"
}
],
"stream": true
}'Streaming isn't just about user experience—it also gives you early visibility into citations as they arrive, letting you prefetch source pages or render attribution UI before the answer completes.
Extracting and Using Citations #
Every Sonar API response includes a citations array mapping claims to source URLs. These aren't decorative—they're the foundation of trust for AI-generated answers.
The citation structure in the response looks like this:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model": "sonar-pro",
"created": 1234567890,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Recent developments in quantum computing include advances in error correction [1], new qubit architectures [2], and progress toward fault-tolerant systems [3]..."
},
"finish_reason": "stop"
}
],
"citations": [
"https://nature.com/articles/quantum-error-correction-2024",
"https://ibm.com/quantum-computing/qubit-breakthrough",
"https://arxiv.org/abs/2024.fault-tolerant-qc"
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 287,
"total_tokens": 301
}
}Rendering Citations in Your UI #
def format_response_with_citations(response):
content = response.choices[0].message.content
citations = response.citations
# Replace bracketed citations with linked footnotes
for i, url in enumerate(citations, 1):
content = content.replace(f"[{i}]", f"<a href='{url}'>[{i}]</a>")
return contentThe citation density varies by model. Sonar Pro delivers roughly twice as many citations as the base Sonar model. For use cases where verifiability matters—medical information, legal research, financial due diligence—this difference isn't marginal. It's the difference between a tool you can ship and a liability you can't.
Pro Search: Multi-Step Reasoning for Complex Queries #
Pro Search augments Sonar Pro with automated tool orchestration—performing multiple web searches and URL fetches to answer complex, multi-faceted questions.
Think of Pro Search as the agentic layer on top of Sonar Pro. Instead of a single search-then-synthesize pass, Pro Search can:
- Decompose complex queries into sub-questions
- Execute multiple web searches sequentially
- Fetch specific URLs to extract detailed content
- Synthesize across all sources with full citation chains
Enabling Pro Search #
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "user",
"content": "Compare the financial performance of Tesla and BYD in 2024, including stock price, revenue growth, and EV delivery numbers"
}
],
stream=True, # Required for Pro Search
web_search_options={
"search_type": "pro" # Options: fast, pro, auto
}
)Pro Search Pricing #
| Search Type | Description | Request Fee (per 1K) |
|---|---|---|
fast |
Standard Sonar Pro | $6–$14 |
pro |
Multi-step tool usage | $14–$22 |
auto |
Automatic classification | Varies by query |
Pro Search is where you start to see the real differentiation from simple RAG implementations. A query like "What are the implications of the latest FDA guidance on AI-enabled medical devices for early-stage startups?" requires:
- Finding the actual FDA guidance document
- Searching for legal analysis of that guidance
- Looking up how similar guidances affected comparable companies
- Synthesizing across regulatory, legal, and business domains
That's not a single search. That's a research workflow. Pro Search handles the orchestration automatically.
Structured Outputs with JSON Mode #
Sonar Pro supports structured outputs via JSON Schema, enabling reliable extraction of structured data from web sources.
The January 2025 GA release expanded access to JSON mode, which is critical for building agentic systems that need to act on search results programmatically.
JSON Mode Example #
import json
response = client.chat.completions.create(
model="sonar-pro",
messages=[
{
"role": "system",
"content": "You are a research assistant. Extract company information in the exact JSON format requested."
},
{
"role": "user",
"content": "Find information about Stripe's latest funding round, including amount, valuation, lead investors, and date"
}
],
response_format={
"type": "json_schema",
"json_schema": {
"schema": {
"type": "object",
"properties": {
"company": {"type": "string"},
"round_type": {"type": "string"},
"amount_raised": {"type": "string"},
"valuation": {"type": "string"},
"lead_investors": {
"type": "array",
"items": {"type": "string"}
},
"date_announced": {"type": "string"},
"sources": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["company", "amount_raised", "valuation", "sources"]
}
}
}
)
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, indent=2))Example Output #
{
"company": "Stripe",
"round_type": "Series I",
"amount_raised": "$6.5 billion",
"valuation": "$50 billion",
"lead_investors": ["Goldman Sachs", "JPMorgan"],
"date_announced": "March 2023",
"sources": [
"https://stripe.com/newsroom/stripe-series-i",
"https://techcrunch.com/2023/03/15/stripe-6-5b-funding"
]
}JSON mode transforms the Sonar API from a chat interface into a data extraction and research automation tool. Combined with Pro Search, you can build agents that research, extract structured data, and trigger downstream workflows—all with source attribution.
Real-World Use Cases: Who's Building With Sonar? #
Early adopters include Copy.ai (sales research), Doximity (medical Q&A), and Zoom (real-time meeting assistance). These implementations demonstrate 20–30% efficiency gains in knowledge work.
Perplexity's launch announcement highlighted three production deployments that illustrate the API's versatility:
Copy.ai: Sales Intelligence at Scale #
Copy.ai integrated Sonar to power prospect and company research for go-to-market teams. The results:
- 8 hours saved per rep per week on manual research
- 20% increase in outreach throughput
- Citation-backed answers enable reps to verify claims before pitching
For sales teams, the citations aren't a nice-to-have—they're armor against compliance issues and customer fact-checking.
Doximity: High-Stakes Medical Research #
Doximity, the professional network for physicians, uses Sonar to provide doctors with research-backed answers on:
- New medical guideline changes
- Insurance reimbursement policies
- Drug interaction information
Given the liability implications of medical information, inline citations are essential for accuracy and trust. A hallucinated drug interaction could be fatal. A missing citation on a guideline change could lead to malpractice exposure.
Zoom: Real-Time Meeting Intelligence #
With AI Companion 2.0, Zoom integrated Sonar Pro to enable live, private searches during video calls:
- Users ask questions without leaving the meeting
- Real-time web searches provide instant context
- Citations allow verification without disrupting flow
As Will Siegelin, Senior Product Manager at Zoom, put it: "Perplexity opens Zoom to knowledge beyond its four walls."
Performance Benchmarks: Why Sonar Pro Leads on Factuality #
Sonar Pro achieves an F-score of 0.858 on SimpleQA, the prevailing benchmark for factual accuracy in short-form question answering. This outperforms GPT-4, Claude, and Gemini variants.
SimpleQA is designed to evaluate whether models can answer short, fact-seeking questions accurately. It tests:
- Correctness: Did the model get the answer right?
- Calibration: Does the model know when it doesn't know?
- Citation quality: Are sources provided and accurate?
| Model | SimpleQA F-Score |
|---|---|
| Sonar Pro | 0.858 |
| Sonar | 0.773 |
| GPT-4 | ~0.75 |
| Claude 3.5 | ~0.72 |
Why does Sonar Pro win? Because it combines LLM summarization with real-time information retrieval rather than relying on frozen training data. A model trained in 2024 can't know about a product launch yesterday. Sonar Pro can.
The 0.858 F-score isn't just a vanity metric. It translates directly to:
- Fewer hallucinations in production
- Less human verification overhead
- Higher user trust and engagement
- Reduced liability for factual claims
FAQ: Perplexity Sonar API #
How do I get a Perplexity API key?
Navigate to the API Portal, generate a new key, and set it as the PERPLEXITY_API_KEY environment variable. Keys are available immediately for all developers as of January 2025.
Is there a free tier for the Sonar API?
No, Perplexity does not offer a free tier for the Sonar API. However, costs start at roughly $0.0057 per query, making experimentation affordable. The AWS Marketplace also offers consolidated billing options for enterprise procurement.
Can I use the Sonar API with my existing OpenAI SDK code?
Yes. The Sonar API is OpenAI-compatible—just change the baseURL to https://api.perplexity.ai and use model names like sonar or sonar-pro. All standard Chat Completions parameters work.
What's the difference between Search Context Size and Context Window?
Search Context Size controls how much web information is retrieved (affects request pricing: Low/Medium/High). Context Window is the maximum tokens the model can process in a single request (127K for Sonar, 200K for Sonar Pro). They're independent parameters.
Does Sonar Pro support function calling or tool use?
Not directly—function calling is part of the Agent API. However, Pro Search enables automated tool orchestration (web search + URL fetching) within the Sonar Pro model. For custom tool integrations, use the Agent API with Perplexity-hosted models.
How reliable are the citations?
Citations link directly to the sources used in generating the answer. Sonar Pro provides approximately 2x more citations than base Sonar. In SimpleQA benchmarks, these citations have high accuracy, but you should always validate critical information from primary sources.
What's the rate limit for the Sonar API?
Rate limits vary by usage tier. New accounts start at 100 requests per minute for Sonar and 50 requests per minute for Sonar Pro. Higher tiers are available through the Admin portal. Check the Rate Limits documentation for current tiers.
Can I filter search results to specific domains?
Yes. The Sonar API supports search filters including domain whitelisting/blacklisting, recency filters, and source type controls. These are configured via the web_search_options parameter.
Is my data used to train Perplexity's models?
No. Perplexity explicitly states that they do not train on customer data submitted through the API. This applies to all Sonar and Agent API usage.
How does Sonar Pro compare to building my own RAG pipeline?
Sonar Pro is typically cheaper and faster than DIY RAG. A custom pipeline requires: embeddings API ($0.10–$0.50 per million tokens), vector database hosting ($50–$500/month), web scraping infrastructure, LLM calls ($10–$30 per million output tokens), and maintenance overhead. Sonar Pro bundles all of this with citations for ~$0.01–$0.04 per query.
What's the latency for a typical Sonar Pro request?
Time-to-first-token is typically under 1 second for standard queries. Streaming responses arrive within milliseconds of search completion. Complex Pro Search queries with multi-step reasoning take 3–8 seconds depending on the number of searches required.
The Strategic Implication: Search as Infrastructure #
The Sonar API GA release isn't just a product launch—it's a signal about the infrastructure stack of the next decade. We're moving from an era where AI applications relied on static knowledge (training data) to one where real-time retrieval is table stakes.
For builders, this means:
You no longer need to choose between recency and quality. Before Sonar, getting fresh data meant sacrificing the reasoning capabilities of frontier models. Now you get both.
Citation-backed answers become the default expectation. Users are increasingly skeptical of unsourced AI claims. The Sonar API makes citations automatic, not an afterthought.
Search becomes a composable primitive. Just as Stripe made payments a function call, Perplexity is making web-scale search a function call. This unblocks use cases that were previously too complex to build.
I've been integrating the Sonar Pro API into client automation workflows over the past month, and the results are consistent: tasks that previously required 3–4 different services (search API + scraper + LLM + citation parser) now collapse into a single, reliable call.
Build Your First Sonar-Powered Application #
The Sonar and Sonar Pro APIs are available now to all developers. Whether you're building a research assistant, a sales intelligence tool, or an AI companion that actually knows what's happening in the world, this is the infrastructure you've been waiting for.
Ready to integrate AI-powered search into your product?
Book an AI automation strategy call and I'll walk you through implementation, pricing optimization, and use case-specific integration patterns.
Related Reading #
- ChatGPT Search Rolls Out to All: What It Means for AI-Powered Discovery
- Perplexity Desktop App: A New Chapter for AI-Native Workflows
- Perplexity Enterprise Pro: AI Search for the Workplace
Written by William Spurlock on January 22, 2025. Pricing and features reflect the January 2025 general availability release.
Related Posts

Zero-Click Search: How to Measure Value When Nobody Clicks
As Google AI Overviews and answer engines scale, traditional CTR models are collapsing. Here is how to measure AI visibility and value when nobody clicks.

The Overlap Between SEO and AI Visibility, and Where They Split
SEO and AI visibility share more DNA than most assume — but link building, traffic behavior, and content strategy each diverge in specific, fixable ways.

FAQ Schema and AEO: The Highest-Leverage Move for AI Citation
FAQ schema and AEO work together to make your content the cited answer in ChatGPT, Perplexity, and Google AI Overviews. Here's the full playbook.

