The most powerful AI systems of tomorrow won't be monolithic models working in isolation, but specialized agents collaborating in orchestrated harmony—each bringing unique capabilities to solve what no single agent could accomplish alone.
— Inspired by Anthropic's research on building multi-agent systems

📚 The Deep Agents Trilogy

This is Part 2 of a three-part series on building production-ready AI agent systems.

#1 Foundations — Understanding deep vs shallow agents and multi-agent orchestration
#2 Building DeepSearch — You are here: Implement a multi-agent research system from scratch
#3 Production Deployment — Deploy to Amazon Bedrock AgentCore with LangFuse observability

Implementing DeepSearch: From Simple Search Agents to Multi-Agent Research Systems

One of the most trending Deep Agents use cases is DeepSearch—a comprehensive research capability that goes far beyond simple information retrieval. This article explores how to implement a DeepSearch system using multi-agent architecture with Strands Agents, demonstrating a concrete approach to a problem many organizations face daily.

What is DeepSearch?

DeepSearch, as the name suggests, is about conducting thorough, in-depth research. It addresses use cases where you need complete coverage of a topic—maximizing both the breadth and depth of information gathered around a specific query.

Consider this scenario: You need a comprehensive report on AI Safety in 2025—its evolution, emerging trends, key developments, and future trajectory. You might frame your query as: "Current state of AI Safety in 2025".

The Limitation of Simple Search Agents

Let's examine what happens when you pose this question to a basic agent equipped with just a single search tool—an internet search capability connected to LinkUp[https://app.linkup.so/], for example.

# Simple agent with just linkup search and file write tool
import argparse
from strands import Agent
from strands_tools import file_write
from tools.internet_search import linkup_search

agent = Agent(
    name="simple_search_agent",
    system_prompt="""Do a comprehensive search on the web for the given prompt and write the results to a file.
    Do multiple searches if needed to get all the information and cover different Angles and perspectives.""",
    tools=[file_write, linkup_search],
)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-p",
        "--prompt",
        type=str,
        default="Current state of AI safety in 2025.",
    )
    args = parser.parse_args()
    prompt = args.prompt
    result = agent(prompt)
    print(result)

When you execute this agent, you'll notice it makes several internet calls to retrieve relevant information. However, the resulting report lacks depth. While the agent performs multiple searches, it doesn't dig deep enough. The output is somewhat satisfactory but fails to cover the full scope of the problem because we haven't truly guided it to do so. We simply gave it a tool and asked a question.

Of course, this is a simple example and you can improve the agent by adding more guidelines in the system prompt, but you will still be left disappointed if you want to do a comprehensive research (not search).

This is where DeepSearch comes in.

From Single Agent to Multi-Agent Architecture

The fundamental difference between a simple search agent and DeepSearch lies in two key aspects:

1. Multi-Agent Architecture

DeepSearch transitions from a single-agent system to a multi-agent system with specialized roles:

Multiple Searcher Agents: Responsible for conducting research on specific angles
Lead Searcher (Orchestrator): Coordinates and manages the different search operations
Citation Agent: Adds source citations to the report

The Lead Searcher: Your Research Orchestrator

The Lead Searcher has a critical role:

Analyze the user's query
Assess the complexity of the request
Spawn multiple searcher agents, each with a specific research angle

For example, when researching AI Safety, the Lead Searcher might deploy agents to investigate:

Current trends in 2025
Established methodologies and frameworks
Regulatory developments
Industry adoption and challenges
Emerging technologies and innovations

Prompting Strategy: Defining Tool Call Limits

Here's a crucial insight from Anthropic's research on building research agents: LLMs struggle to determine on their own how many sub-agents or tool calls to make. The solution? Explicitly guide this in your prompts.

In the lead searcher prompt, you can use the subagent count guidelines to determine how many subagents to create.

<subagent_count_guidelines>
When determining how many subagents to create:
- **Simple/Straightforward queries**: 1 subagent
- **Standard complexity**: 2-3 subagents
- **Medium complexity**: 3-5 subagents
- **High complexity**: 5-10 subagents (maximum 20)
- **IMPORTANT**: Never create more than 20 subagents. If a task requires more, restructure to consolidate. Prefer fewer, more capable subagents.
</subagent_count_guidelines>

Complete prompt here: prompts/research_lead.py.

This is a best practice applicable beyond DeepSearch: Whether you're implementing deep agents, multi-agents, or even simple single-agent systems with tool access, explicitly specifying how many tool calls should be made based on task complexity significantly improves agent performance and reasoning.

Search Agents: Specialized Research Workers

The Lead Searcher calls multiple search agents in parallel to optimize response time. Each search agent:

Focuses on a specific angle defined by the Lead Searcher
Has access to internet search tools like LinkUp, Exa, Firecrawl, Tavily, or others based on your preferences
Executes multiple tool calls guided by complexity heuristics

Similar to the Lead Searcher, you should specify tool call limits in the search agent prompts:

1. **Planning**: Think through the task thoroughly. Make a research plan:
   - Review the requirements of the task
   - Develop a research plan to fulfill these requirements
   - Determine what tools are most relevant ({internet_tool_name} for web search)
   - Determine a 'research budget' - roughly how many tool calls needed:
     * Simple tasks (e.g., "when is the tax deadline"): under 5 tool calls
     * Medium tasks: 5 tool calls
     * Hard tasks: about 10 tool calls
     * Very difficult/multi-part tasks: up to 15 tool calls

Complete prompt here: prompts/research_subagent.py.

There's no magic formula for these numbers, but providing clear guidelines helps agents make better decisions about research depth.

Search Agent Workflow

Each search agent follows this process:

Conduct targeted searches on its assigned angle
Read and analyze sources
Extract relevant information
Summarize findings
Write results to a findings file using the file_write tool

This file-based approach allows the Lead Searcher to later read and synthesize all agent results.

The Complete DeepSearch Flow

The sequence is straightforward yet powerful:

Lead Searcher receives the query and spawns search agents in parallel
Search Agents write their results to individual findings files
Lead Searcher reads all findings files and creates a synthesis report
Citation Agent is called for the final step

The Citation Agent: Adding Source Credibility

The Citation Agent doesn't contribute new knowledge. Its sole purpose is to add proper citations:

Read the synthesis report created by the Lead Searcher
Map claims and facts back to original sources in the findings files
Add appropriate citations throughout the document

This separation of concerns is elegant: research agents focus on gathering information, while the citation agent ensures academic rigor and traceability.

<workflow>
1. **Read the synthesis file**: Use file_read to read the final synthesized report file (usually ending with a descriptive name like `./ai_safety_2025_comprehensive_report.md`)
2. **Discover source document directories**: Use file_read on the current directory `./` to see subdirectories, then identify directories matching pattern `./research_documents_[topic]/`
3. **Read all source documents**: Use file_read on each source directory to list files, then read each source file (files like `source_1.md`, `source_2.md`, etc.)
   - Each source file has a `source_url:` at the top - extract this for citations
   - Read the content to understand what information each source provides
4. **Add citations**: Based on the source documents, add citations to the synthesized report
5. **Write updated report**: Use file_write to save the updated report with citations to the same filename
</workflow>

You can obviously find the full prompt here prompts/citations_agent.py.

Todo-Driven Execution

Since we're working with the deep agents framework, as showcased in Part 1 of this trilogy, the Lead Searcher has access to a Todo tool. This allows it to:

Maintain a dynamic to-do list of research angles to cover
Track which sub-agents have been called
Manage its reasoning process and execution steps
Ensure comprehensive coverage of the topic

This todo-driven approach provides transparency into the agent's thinking process and helps ensure nothing is missed.

For our example, on of my runs of the lead searcher with the todo tool looks like this:

{
  "state": {
    "todos": [
      {
        "id": "1",
        "content": "Plan comprehensive research strategy for AI safety in 2025",
        "status": "completed"
      },
      {
        "id": "2",
        "content": "Deploy subagent to research current AI safety concerns and challenges (2025)",
        "status": "completed"
      },
      {
        "id": "3",
        "content": "Deploy subagent to research leading organizations and initiatives in AI safety",
        "status": "completed"
      },
      {
        "id": "4",
        "content": "Deploy subagent to research recent technical developments (alignment, interpretability, robustness)",
        "status": "completed"
      },
      {
        "id": "5",
        "content": "Deploy subagent to research regulatory frameworks and policy developments",
        "status": "completed"
      },
      {
        "id": "6",
        "content": "Deploy subagent to research notable AI safety incidents and events in 2024-2025",
        "status": "completed"
      },
      {
        "id": "7",
        "content": "Synthesize all findings into comprehensive report",
        "status": "completed"
      },
      {
        "id": "8",
        "content": "Deploy citations agent to add proper citations to the report",
        "status": "completed"
      }
    ]
  }
}

You can do your own runs and see the results for yourself, just run the agent.py file with the prompt you want to research. Seing this in action really helps understand the process and the benefits of using multi-agent architecture. You can see how your agent reflects on the task, see the agent progressively read and update the todo list. This can really help detecting some prompt tweaking you need to do.

Model Selection Strategy: Optimizing for Cost and Performance

In our implementation, we've made strategic choices about which Claude models to use for each agent type:

Agent Type	Model	Rationale
Lead Searcher	Claude 4.5 Sonnet	Requires strong reasoning and orchestration capabilities
Search Agents	Claude 4.5 Sonnet	Needs analytical skills to evaluate sources and extract insights
Citation Agent	Claude 4.5 Haiku	Simple extraction and mapping task, no complex reasoning needed

# Example configuration
# Format prompts with the internet tool name (you can use the internet search tool you want)
lead_prompt = RESEARCH_LEAD_PROMPT.format(internet_tool_name=tool_name)
subagent_prompt = RESEARCH_SUBAGENT_PROMPT.format(internet_tool_name=tool_name)

# Research subagent - performs focused research tasks
research_subagent = SubAgent(
    name="research_subagent",
    description=(
        "Specialized research agent for conducting focused investigations on specific topics. "
        "Use this agent to research specific questions, gather facts, analyze sources, and compile findings. "
        f"This agent has access to {tool_name} for comprehensive web search capabilities. "
        "Results are written to files to keep context lean. "
        "Source documents are saved to research_documents_[topic]/ directories for citation purposes."
    ),
    prompt=subagent_prompt,
    tools=[research_tool, file_write],
    model=get_default_model(), # claude sonnet 4.5 with interleaved thinking
)

# Citations agent - adds source references to reports
citations_agent = SubAgent(
    name="citations_agent",
    description=(
        "Specialized agent for adding citations to research reports. "
        "Use this agent after completing a research report to add proper source citations. "
        "This agent reads the synthesized report and all source documents from research_documents_[topic]/ directories. "
        "It then adds proper inline citations and a references section."
    ),
    model=basic_claude_haiku_4_5(),
    prompt=CITATIONS_AGENT_PROMPT,
    tools=[file_read, file_write],
)
session_id = "example-task-session"
storage_dir = "./.agent_sessions"
session_manager = FileSessionManager(
    session_id=session_id,
    storage_dir=storage_dir,
)

agent = create_deep_agent(
    instructions=lead_prompt,
    subagents=[research_subagent, citations_agent],
    tools=[file_read, file_write],
    session_manager=session_manager,
)

The Citation Agent uses Haiku because its task is straightforward: information extraction and mapping. It doesn't require deep reasoning capabilities, so using the more expensive Sonnet model would be wasteful.

Cost optimization matters. Don't use frontier models for every task. Match the model capability to the task complexity.

As you can see, my Deep Agents implementation with Strands Agents is really straightforward and you can easily adapt it to your own use case. You just need to take some time on defining the needed subagents, their aim, define sub agent and tools calling guidelines, and you're good to go.

Why This Architecture Works

The DeepSearch approach elegantly solves the comprehensive research problem by:

Parallel execution: Multiple agents working simultaneously reduce total research time
Specialized focus: Each agent digs deep into a specific angle, ensuring thoroughness
Orchestrated coordination: The Lead Searcher ensures all angles are covered systematically
File-based communication: File-based communication creates clear checkpoints and enables synthesis
Quality assurance: The Citation Agent adds credibility and traceability

This demonstrates how a complex research problem—which by definition requires covering multiple angles—can be tackled effectively using the deep agents framework.

Key Takeaways

Prompting still matters: Define clear heuristics and decision rules that translate into actionable guidelines in your prompts
Parallel execution: Multiple search agents working simultaneously on different angles dramatically improves both coverage and speed (this comes with cost and throwbacks, check this article to learn more about the tradeoffs). My deep Agents implementation gives you the possiblity to force sequential execution if needed.
File-based communication creates structure: Having agents write findings to files establishes clear checkpoints and enables effective synthesis
Optimize costs strategically: Match model capabilities to task requirements—use frontier models (Sonnet) for reasoning-heavy tasks and efficient models (Haiku) for simple extraction work
Separation of concerns improves quality: Dedicate a specialized Citation Agent to add source attribution rather than burdening research agents with this task
Todo-driven execution provides transparency: The Lead Searcher's todo list offers visibility into the research strategy and ensures comprehensive coverage

This architecture demonstrates how breaking down complex research tasks into orchestrated, specialized sub-tasks can dramatically improve both the depth and breadth of information gathering—a pattern applicable to many challenging use cases beyond search.

Hope you enjoyed this article and found it useful. Feel free to reach out to me if you have any questions or feedback.

PA,