Deep Research: Exa.ai vs Other Providers

On a recent project — a note-taking app — one of the highest-value features I could add was deep research. Not as a separate tab or external tool, but embedded in the flow. Run research inside the app, drop results into a note, keep writing. No switching to a browser, copying links, pasting into docs. Less context switching, more flow.
So I explored it. I looked at Exa.ai, Perplexity, Tavily, and OpenAI's Deep Research. I built a multi-step enrichment pipeline on top of Exa. And I eventually simplified — Exa alone, with a light review step, was good enough for the MVP.
But the pipeline taught me a lot about what you can layer on top of raw research output. Here's what it looked like.
Why embed deep research in a note app?
Notes are where thinking happens. If research lives elsewhere, you have to: leave the note, run the query, wait, copy results, come back, paste. That breaks flow. Embedding research means: ask from inside the doc, get results that land right there, keep writing.
That's what I wanted. The question was how to get quality reports without reinventing search. Exa.ai's Research API gave me a starting point: agentic web search with structured output, clear pricing, and published benchmarks. A typical task (6 searches, 20 pages, 1k reasoning tokens) runs about $0.14 on standard, $0.24 on pro. On SimpleQA, exa-research-pro hits 94.9% — ahead of Perplexity, Tavily Advanced, and YOU.com.
The catch: raw Exa output can feel thin for complex reports. Fast and economical. Doesn't always deliver the narrative depth or section-by-section polish you want. So I built a pipeline on top of it.
The multi-step enrichment pipeline
I spun up a separate instance for a deep research server that:
- Submits the user's query to Exa
- Receives structured results (sections, sources, citations)
- Runs a multi-step enrichment pipeline on that output
- Returns a polished report to the note app
The pipeline had five phases:
Phase 1: Structure analysis
First I analyzed the Exa response to understand its structure — section order, nesting, and how to preserve it during enhancement. A structure analysis agent (via OpenRouter) inspected the raw data and produced guidance for the rest of the pipeline.
# Generalized: analyze Exa output structure before enhancement
structure_guidance = await structure_analysis_service.analyze_structure(exa_response)
# Returns: section_processing_order, enhancement_guidance, structure_preservation_rulesPhase 2: Section extraction
I extracted sections from the Exa response. Exa returns structured data (often nested objects or arrays). I walked that structure and turned it into a flat list of sections with id, title, and content.
# Generalized: extract sections from Exa response
def extract_sections(exa_data: dict) -> list[dict]:
sections = []
for section_name, section_content in exa_data.get("data", {}).items():
if isinstance(section_content, dict):
for subsection_name, subsection_content in section_content.items():
if isinstance(subsection_content, list):
content_text = "\n".join(
format_item(item) for item in subsection_content
)
sections.append({
"id": f"{section_name}_{subsection_name}",
"title": f"{section_name} - {subsection_name}",
"content": content_text,
})
return sectionsPhase 3: Section-by-section enhancement
For each section, I ran two sub-steps in order: enhance then summarize. The key detail: a sliding context window — previous enhanced sections and summaries — so each step could maintain continuity.
# Generalized: sequential enhancement with sliding context
context = {
"research_topic": instructions,
"structure_guidance": structure_guidance,
"previous_enhanced_sections": [],
"previous_summaries": [],
}
for section in sections:
# Add prior work to context for continuity
if context["previous_enhanced_sections"]:
context["continuity_notes"] = [
f"Previous section '{es['title']}' covered: {es['content'][:200]}..."
for es in context["previous_enhanced_sections"][-3:]
]
# Step 3a: Enhance section (heavy model — e.g. o4-mini:online or gemini-2.5-flash:online)
enhanced = await enhancement_service.enhance_section(
section_id=section["id"],
content=section["content"],
context=context,
ai_model="google/gemini-2.5-flash:online",
)
context["previous_enhanced_sections"].append(enhanced)
# Step 3b: Summarize enhanced section (lighter model — e.g. gpt-4.1-mini)
summary = await summarization_service.summarize_section(
section_id=section["id"],
content=enhanced["enhanced_content"],
context=context,
ai_model="google/gemini-2.5-flash-lite",
maintain_key_insights=True,
)
context["previous_summaries"].append(summary)
context["current_section"] = {**enhanced, **summary}Enhancement added depth: more context, examples, citations. Summarization pulled out key insights and kept them consistent across sections. Both used the structure guidance so I didn't break nesting or order.
Phase 4: Report finalization
With all enhanced sections and summaries, I ran a finalization step that combined them into a single report — markdown, consistent headings, citations preserved.
# Generalized: finalize report from enhanced sections
result = await finalization_service.finalize_report(
job_id=job_id,
enhanced_sections=formatted_enhanced_sections,
original_data=exa_response,
research_context={"topic": instructions, "total_sections": len(sections)},
structure_guidance=structure_guidance,
)
# Returns: final_report (structured), markdown_content (ready for the note)Phase 5: Persist and stream progress
The server emitted progress events (e.g. "Enhanced Section 1: Introduction", "Summarized Section 2: ...") over WebSocket so the note app could show a live timeline. When done, the final report persisted to the main app via HTTP PUT.
What I used: services and models
- Section enhancement — A service calling OpenRouter with a heavy model (
o4-mini:onlineorgemini-2.5-flash:online) to expand and enrich each section. - Section summarization — A lighter model (
gpt-4.1-miniorgemini-2.5-flash-lite) to summarize and extract key insights. - Report finalization — Combines enhanced sections into a coherent markdown report, mirroring the original structure.
- Structure analysis — Analyzes the Exa response and produces processing order and preservation rules.
I had a test file that loads mock Exa data and runs the full sequence: structure analysis, section extraction, enhance, summarize, finalize.
Why I simplified the research pipeline
The pipeline worked. It produced richer reports than raw Exa. But it added latency (several LLM calls per section) and cost (enhancement + summarization models). Despite total cost being less than a straight call from a dedicated Deep Research model, I found that Exa's output — with a light review step to repair tables and fill gaps from task outputs — was enough for the MVP. So I added a simpler path that bypasses the enrichment pipeline and streams Exa results directly.
The enrichment pipeline is still in the codebase. It's useful when you want maximum polish or when Exa alone feels too thin. The architecture is the same: Exa provides the body of knowledge; the pipeline adds synthesis on top.
Takeaways
- Embedding research in a note app reduces context switching — run from inside the doc, results land in place.
- Exa is a strong body of knowledge — cost, benchmarks, structured output. Good for MVP.
- Raw Exa can feel thin for complex reports. A multi-step pipeline (structure analysis, extract, enhance, summarize, finalize) adds the depth.
- Sliding context — passing previous sections and summaries into each step — keeps the report coherent.
- You can simplify later. I did. Exa + light review was enough. The pipeline stays available when I need it.