Deep Agents Trilogy #1: From Shallow to Deep — The Architecture of Intelligent Agent Orchestration

Shallow agents work great until they don't. And when they fail, they fail hard — losing important context, forgetting objectives, or becoming overwhelmed by complexity.

📚 The Deep Agents Trilogy

This is Part 1 of a three-part series on building production-ready AI agent systems.

#1 Foundations — You are here: Understanding deep vs shallow agents and multi-agent orchestration
#2 Building DeepSearch — Implement a multi-agent research system from scratch
#3 Production Deployment — Deploy to Amazon Bedrock AgentCore with LangFuse observability

It's hardly a secret anymore: we are now living in the era of agents

This evolution began with what we can now call shallow agents: simple agents widely adopted across companies to handle a range of tasks. These agents have proven sufficient for basic problems — but as soon as the complexity increases, their limitations become obvious.

Shallow Agents

A shallow agent consists of the fundamental components found in any agentic system:

The Brain (LLM): The reasoning engine that performs the "think and act" loop popularized by the ReAct framework. The LLM reflects on the problem, decides which actions to take, and iteratively works toward a solution.
The Body (Tools): The actions the agent can execute — tools that might connect to external APIs, perform mathematical calculations, execute SQL queries on MongoDB or Athena databases, or interact with various services.

The system prompt serves as the agent's instruction manual, defining how it should reason and which actions to take to solve user problems. This architecture proves sufficient for straightforward tasks, making shallow agents widespread across many organizations.

The Limitations of Shallow Agents

However, as task complexity increases, shallow agents reveal critical weaknesses:

1. Forgetting and Implicit Planning

For long-running tasks, agents can lose track of their objectives. The planning is implicit — the agent has a general idea of what needs to be done, but lacks explicit tracking of completed steps versus remaining work.

After working for an extended period, the agent might not even remember:

What it's currently doing
Why it's doing it
Which steps have been completed
What remains to be done

This implicit planning becomes particularly problematic for complex, multi-step tasks where maintaining context across many reflection steps is essential.

2. Context Window Overflow

Every LLM operates with a maximum context window — a token limit that constrains how much information it can process simultaneously. This context window limit constantly evolve and can go up to 1 000 000 tokens for some models like Claude Sonnet 4.5-1M. These models are obviously more expensive. Complex tasks generate substantial context from: multiple reasoning steps, tool execution outputs, intermediate results, previous conversation history.

As tasks grow in complexity, the context can quickly overflows, forcing the agent to truncate or lose earlier information — potentially including critical details needed to complete the task successfully.

3. Loss of Specialization

Agents perform best when specialized, a single agent handling complex tasks requires many tools and capabilities, transforming it into a generalist "Swiss Army knife."

This loss of specialization manifests as:

Diluted Context: The prompt becomes a "melting pot" of instructions and context: "You can do this, you can do that. For this action, follow these steps. For that action, do something else..." This complexity confuses the agent, making it less precise in its decision-making
Reduced Expertise: With attention spread across many domains, the agent can't develop the deep capability needed for sophisticated problem-solving

These limitations aren't theoretical — they represent real barriers to deploying agents for enterprise-grade complex tasks.

Enter the Deep Agents

To overcome these fundamental limitations, two architectural principles are essential:

Specialization — agents should focus on narrow, well-defined skills rather than becoming generalists
Structured tracking — the agent must maintain explicit, persistent knowledge of objectives and progress

These principles aren't just nice-to-haves — they're architectural requirements for handling complex tasks effectively.

This is where Deep Agents come in.

The concept, popularized by LangChain and inspired (I guess) by the Orchestrator pattern (popularized by Anthropic) builds on a system of structured planning and orchestration.

The Architecture of Deep Agents

At the heart of a Deep Agent lies an orchestrator agent — the project manager — which defines objectives, creates an explicit plan (a to-do list), and coordinates specialized sub-agents.

The orchestrator has a detailed system prompt that provides precise instructions for:

Creating and updating the to-do list
Planning and calling sub-agents
Using dedicated tools (if the orchestrator needs its own actions)

The To-Do Tool: Formalizing the ReAct framework

This is the cornerstone tool of the Deep Agent pattern. Unlike traditional tools that interact with external systems, the to-do tool manages the agent's internal plan.

See the complete description here Here's how it works:

First Call: When the orchestrator receives a task, it immediately calls the to-do tool to break down the objective into discrete steps
Persistent Storage: The to-do list is written to persistent storage — the agent's memory or state — ensuring it survives across the entire task lifecycle
Status Tracking: Each to-do item has an explicit status: pending: Not yet started, in progress: Currently being worked on, completed: Finished successfully
Continuous Updates: Throughout execution, the orchestrator regularly calls the to-do tool to check progress and update task statuses

The workflow looks like this:

Orchestrator checks the to-do list: "What's next?"
Identifies the first pending item, marks it as in progress
Works on that task (potentially calling a sub-agent)
Upon completion, marks it as completed and persists the update
Repeats: "What's next?" → checks the to-do list again

This pattern ensures the agent always knows where it is in the process — even in long-running tasks that might span hours or involve hundreds of reasoning steps.

Persistent Storage: The Foundation

The to-do list must be stored in persistent storage that survives across:

Individual tool calls
Sub-agent invocations
Agent failures and restart

Common storage mechanisms include Agent State - Key-value storage managed by the agent framework, Agent Memory - Conversation and context storage, External State Stores - Databases or file systems for enterprise deployments

This persistence is what differentiates Deep Agents from shallow agents with implicit planning. The plan isn't just "in the agent's head" (context window) — it's written down and accessible at any point in the execution.

In essence, the to-do tool is a formalization of the ReAct framework, transforming implicit reasoning into an explicit, persistent, frequently-updated plan.

Concerning failures handling, the agent is instructed not to invent a new status, but to keep the failing task active (in progress) while creating a new pending task specifically aimed at resolving the current roadblock, maintaining a forward-looking, problem-solving workflow.

Just as humans rely on checklists for complex work, agents benefit from a structured approach.

The Task Tool: Agent Spawner

Once the orchestrator has identified which sub-agent to call (based on the to-do list), it uses the task tool to spawn specialized agents.

The task tool typically accepts:

Agent selection: Which specialized agent to invoke (e.g., "research_agent", "writer_agent")
Task description: The specific instructions and context for the sub-agent (prompt)

Each sub-agent is configured with:

A system prompt: Defines its expertise and behavior
Dedicated tools: Specialized capabilities relevant to its role
Model selection: Potentially different model optimized for its task complexity (#finops, #cost-optimization)

File System Tools: Bridging Sub-Agent Results

A critical component of Deep Agents is the ability to pass results between sub-agents efficiently. This is where file system tools become essential.

Common file system tools include: Read: Access file contents, Write: Create new files, Edit: Modify existing files

Why files matter for Deep Agents:

Rather than passing all intermediate results through the orchestrator's context, sub-agents can write their outputs to files:

Searcher Agent #1 → writes to search_results_A.md
Searcher Agent #2 → writes to search_results_B.md
...
Orchestrator knows: "Results are in search_results_*.md"
...

This can be really useful for debugs to check intermediate results and the final output.

This approach provides:

Context Efficiency: Intermediate results don't bloat the orchestrator's context window
Clear Data Flow: Each agent knows exactly where to read inputs and write outputs
Scalability: Can handle arbitrarily large intermediate results without context limits

The orchestrator tracks file paths in its to-do list or state, enabling sub-agents to reference previous work without overwhelming the system's memory.

Sub-Agent Structure

Each sub-agent in a Deep Agent system is itself a complete agent with:

System Prompt (Static): A detailed prompt defining the agent's:

Role and expertise
Behavioral guidelines
Constraints and limitations
General approach to problems in its domain

This prompt remains consistent across invocations, establishing the agent's core identity.

Task Description (Dynamic): Passed by the orchestrator at runtime, containing:

The specific task to accomplish
Relevant context from the to-do list
References to input files or data
Expected output format and location

Dedicated Tool Set: Each sub-agent has tools appropriate to its specialization, for example:

Sub-Agent Type	Common Tools
Search Agent	Web search APIs, scraping tools
Synthesizer Agent	Often fewer tools, focuses on reasoning
Citation Agent	(no tools needed)
Data Agent	Database queries (SQL), data transformation

State Isolation: Each sub-agent has its own isolated state for each invocation. This ensures clean separation between tasks and prevents context pollution from accumulating across multiple calls.

Benefits of the Deep Agent Pattern

According to research and implementations by Anthropic and others, Deep Agents offer transformative advantages for complex, multi-step tasks:

Enhanced Task Complexity Handling

Deep Agents can tackle significantly more complex problems than shallow agents. The explicit planning and modular architecture allow the system to break down sophisticated challenges into manageable sub-tasks, each handled by a specialized expert.

Traceability and Transparency

Every action and decision is tracked via the persistent to-do structure. At any point, you can inspect:

Which tasks have been completed
What's currently in progress
What remains to be done
Why each decision was made

This visibility is invaluable for debugging, auditing, and understanding agent behavior.

Scalability: The Microservices Architecture

The orchestrator-sub-agent model mirrors microservices architecture in software engineering. Rather than a monolithic agent doing everything, you have:

A lightweight orchestrator managing coordination
Specialized sub-agents, each focused on specific capabilities
Clean interfaces (file I/O, to-do list) between components

Adding new capabilities is straightforward: Create a new specialized sub-agent, register it with the orchestrator, and update the planning logic. No need to retrain or restructure the entire system.

Robustness and Recovery

When a task fails in a shallow agent, the entire process often needs to restart from scratch. With Deep Agents, failure handling is granular:

The orchestrator knows exactly which task is being worked on when failure occurs
It can retry just that specific task with a different approach
Other completed tasks remain intact
The agent can implement sophisticated retry strategies based on task status tracking
Failed tasks can be marked explicitly, enabling automated recovery workflows

This structured approach to task tracking dramatically improves reliability for production systems.

Maintained Specialization

Each sub-agent maintains focused expertise:

Search agents: Excel at web research with specialized search tools and APIs
Citation agents: Specialize in reference formatting
Data agents: Expert at querying databases and processing results

Unlike a Swiss Army knife agent that's mediocre at everything, each component is exceptional at its specific function.

Context Engineering

By distributing work across sub-agents and using file storage for intermediate results, Deep Agents allows to manage the context window limitations that plague shallow agents. Each sub-agent works within its own context boundary, and the orchestrator maintains only the high-level plan and file references.

The structured on-the-fly orchestration enables agents to handle deeper reasoning and longer tasks without losing coherence or efficiency.

The Power of Model Selection: Microservices for Agents

One of the most compelling advantages of Deep Agents is the ability to use different models for different sub-agents based on task complexity. This approach directly mirrors microservices architecture in software engineering.

The Principle: Right-Size Your Models

Not every task requires a frontier model. Consider the spectrum:

Simple, deterministic tasks (extracting data, formatting data, making simple API calls): These can use lighter, faster models
Complex reasoning tasks (strategic planning, synthesis, nuanced decision-making): These benefit from powerful, sophisticated models

In a Deep Agent system:

The orchestrator typically uses a strong model (it needs sophisticated planning capabilities), e.g. Claude Sonnet 4.5
Specialized sub-agents use models appropriate to their task complexity, e.g. Claude Haiku 4.5, Fine Tuned Small Language Models

Two Major Benefits:

Cost Efficiency

By matching model capability to task complexity, you can reduce operational costs dramatically:

Running simple sub-agents on lighter models instead of frontier models saves significant money at scale
The orchestrator and a few critical sub-agents use expensive models, but most of the work happens on cheaper tiers
Cost reductions of 50-80% are achievable with thoughtful model selection

Performance Optimization

Lighter models respond faster, reducing overall system latency:

Smaller models typically have much faster response times
Frontier models with extended capabilities take longer but provide deeper reasoning
For tasks that don't need advanced reasoning, lighter models provide significant speed improvements

This is exactly how modern software systems are built:

You don't use the same infrastructure for every service
A simple API gateway doesn't need the same compute as a machine learning inference service
Right-sizing resources based on requirements is fundamental to scalable architecture

Deep Agents bring this principle to the agentic world: specialized, right-sized components working together in a coordinated system, each optimized for its specific role — both in capabilities and cost.

When Not to Use Deep Agents

Despite their advantages, Deep Agents are not a universal solution. Understanding when not to use them is as important as knowing when they excel.

Simple, Straightforward Tasks

For basic operations that a shallow agent can handle in a few steps, the overhead of Deep Agent architecture is unnecessary (Answering simple questions from a knowledge base, Performing basic calculations or data lookups, Single-step API interactions

Use a standard shallow agent — it's faster, simpler, and more cost-effective.

Deterministic Workflows: When you already know the exact sequence of steps required (X → Y → Z), you don't need dynamic planning (ETL pipelines with fixed stages, Standard data processing workflows, Scheduled tasks with predetermined logic

Latency-Critical Applications

The Deep Agent pattern introduces overhead (Initial planning phase (creating the to-do list), Frequent to-do updates during execution, Spawning and coordinating multiple sub-agents, File I/O for intermediate results)

This adds latency — potentially seconds or more depending on system complexity.

For real-time or low-latency requirements (e.g., chatbot responses, API endpoints with strict SLAs), prefer shallow agents or even non-agentic approaches (Webhooks, API endpoints, Scheduled tasks)

Cost-Sensitive, High-Volume Operations

While Deep Agents can be cost-optimized through model selection, they still make more LLM calls than shallow agents (Multiple orchestrator calls for planning and updating, Each sub-agent invocation, Synthesis and coordination steps)

For high-volume, cost-sensitive applications where margins are tight, simpler approaches may be more economical.

The Decision Tree: Workflow vs Deep Agent

Obviously, this is a simplified decision matrix, and the real world is more complex, the rule is:

Use the simplest solution that meets your needs. There is no one-size-fits-all solution. Test, iterate, and optimize.

Implementing Deep Agents with Strands

The Deep Agent pattern can be implemented with any agent framework that provides the necessary primitives. To demonstrate these concepts in practice, I've built a reference implementation using Strands inspired by LangChain implementation, leveraging its native capabilities for state management, tool orchestration, and model flexibility.

The implementation is available on GitHub: strands-deep-agents

Why Strands for Deep Agents?

Strands provides the key primitives needed for Deep Agent patterns:

Persistent State Management: Built-in agent state that survives across requests, essential for maintaining TODO lists
Rich Tool Ecosystem: Both a flexible tool system and community packages (like strands-agents-tools) for file operations
Agent-as-Tool Pattern: Native support for spawning sub-agents and using them as tools
Model Flexibility: Easy per-agent model configuration for cost optimization
Async Support: Full async/await patterns for concurrent sub-agent execution

Core Pattern Implementation

The implementation follows the Deep Agent pattern by providing three foundational tools:

Planning Tool: Manages the TODO list in persistent agent state and thus in memory see:
File System Tools: Enable inter-agent communication via file I/O
Task Delegation Tool: Spawns specialized sub-agents with isolated contexts

Example Usage

from strands_deep_agents import create_deep_agent

# Create a deep agent with specialized sub-agents
agent = create_deep_agent(
    instructions="You are a helpful assistant that excels at complex tasks.",
    subagents=[
        {
            "name": "researcher",
            "description": "Conducts thorough research",
            "prompt": "You are a research specialist.",
        },
        {
            "name": "writer",
            "description": "Creates polished content",
            "prompt": "You are a skilled writer.",
        }
    ]
)

# The agent automatically uses the Deep Agent pattern
result = agent("Create a comprehensive report on renewable energy trends in the last 10 years")

You can see a full DeepSearch implementation here

Strands-Specific Implementation Notes

For those interested in the technical details of this particular implementation:

State Structure: Uses a custom DeepAgentState with todos (task list)
Status Support: Implements three task statuses: pending, in_progress, completed
Tool Access: Tools receive a ToolContext parameter providing access to agent state
Session Persistence: Supports saving/loading agent state across application restarts
Model Options: Defaults to Claude Sonnet 4, with per-sub-agent model overrides
Execution Modes: Supports both parallel and sequential tool execution

See the GitHub repository for complete documentation on the Strands-specific implementation details.

The Evolution Continues

The rise of Deep Agents marks a natural but significant progression in the agentic era — a sophisticated evolution building on the foundations laid by frameworks like ReAct and Anthropic's Orchestrator Pattern.

Understanding the Lineage

Deep Agents didn't emerge from nowhere. They represent the maturation of ideas that have been developing over the past few years:

ReAct Framework: Introduced the "reasoning and acting" loop that became fundamental to agentic systems
Orchestrator Pattern: Popularized by Anthropic, demonstrated the power of coordinating multiple specialized components
Deep Agents: Formalized by LangChain, combines orchestration with explicit, persistent planning — the missing piece that enables truly complex task handling

This evolution shows how the AI community iterates and improves: each pattern addresses limitations of its predecessors while building on their strengths.

Key Principles to Remember

Three fundamental insights underpin successful agentic systems:

Specialization Matters: Focused agents with clear expertise outperform generalist "Swiss Army knife" agents
Explicit Tracking is Essential: For complex tasks, implicit reasoning fails. Persistent, structured plans (to-do lists) keep agents on track
Choose the Right Pattern: Match your architecture to your problem's complexity, not the other way around
Context Engineering is key to avoid context window limitations, maintain focus, coherence and efficiency.

Looking Forward

The agentic era is still remarkably young — we're witnessing the equivalent of the early days of microservices or containerization. Patterns are emerging, best practices are solidifying, but the field remains dynamic and fast-evolving.

When building agents:

Study existing patterns — don't reinvent what others have already refined
Explore actively — experiment with different architectures to understand their trade-offs
Stay updated — new patterns and improvements emerge constantly
Share learnings — the community benefits from practical experiences and lessons learned

The evolution from shallow to deep agents demonstrates that we're moving toward more structured, maintainable, and production-ready agentic systems.

The future belongs to those who can orchestrate specialized intelligence effectively.

As agents become increasingly central to how we build software, understanding these patterns — and knowing when to apply each — will be a defining skill for the next generation of AI engineers.

Complete MindMap

I know it's a bit long, but I hope you found it useful. I wanted to explain the concepts in details and show a practical implementation. Feel free to reach out to me if you have any questions or feedback.

PA,