AI Agent Development #5: Dividing Work with Subagents
In Part 4 we trimmed, cleared, and summarized context. But there is a more fundamental solution: do not put everything in one context in the first place. If you hand part of the work to a separate agent and only bring back the result, the intermediate steps never enter the main agent’s context at all. In this post we cover subagents.
Why divide the work #
The benefits of subagents come down to three things.
- Context isolation — research work comes with the process of digging through dozens of search results. If a subagent goes through that process in its own context and reports only the conclusion, the main agent receives only the conclusion. If the techniques in Part 4 were ways to “reduce what has piled up,” this is a way to “keep it from piling up.”
- Role separation — each subagent can get a different system prompt and a different tool list. Give the research agent only search tools, the writing agent only file tools, and so on. It realizes the principle from Part 2 — “the narrower the tool list, the better” — at the role level.
- Parallel execution — if the tasks are independent of each other, you can run several subagents at the same time.
Building a delegate tool #
The core of the implementation is surprisingly simple. A subagent is ultimately another loop running on a separate messages array. Take the run_agent from Part 1 almost as is, but make it accept the role and tools as parameters.
def run_subagent(system: str, tools: list, task: str, max_steps: int = 15) -> str:
"""Perform the task in an isolated context and return only the final report."""
messages = [{"role": "user", "content": task}]
for _ in range(max_steps):
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=16000,
system=system,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
return next(b.text for b in response.content if b.type == "text")
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": run_tools(response)})
return "(The subagent could not finish the task within the step limit.)"Then expose it as a tool of the main agent.
{
"name": "delegate_research",
"description": (
"Delegate a research task to a research-specialist subagent. "
"If a question requires searching and reading multiple documents, use this tool instead of searching directly. "
"In task, spell out concretely what to find out and the expected report format."
),
"input_schema": {
"type": "object",
"properties": {
"task": {"type": "string", "description": "What to research and the expected report format"},
},
"required": ["task"],
},
}
def delegate_research(task: str) -> str:
return run_subagent(
system="You are a research-specialist agent. Research what is requested and report concisely with evidence.",
tools=research_tools, # search and read tools only
task=task,
)From the main agent’s point of view, a subagent is just one more tool. Nothing about the loop structure changes.
Making the report format a contract #
The common failure with subagents happens not in the delegation but in the reporting. The main agent receives only the subagent’s final text, so if that text is missing the needed information, the entire delegation was wasted effort. That is why stating the report format in the task matters.
Research the refund policy changes from May 2026.
Report format:
- Key changes (3 lines or fewer)
- Title and location of the source documents
- If anything could not be verified, say so explicitlyThe line “say so explicitly if anything could not be verified” is especially useful. Without it, a subagent that comes up empty-handed tends to fill its report with plausible guesses.
Orchestrator-worker and parallel execution #
Once there are several subagents, the main agent’s role changes. Instead of doing the work itself, it becomes an orchestrator that divides the work and merges the results. Independent delegations can run at the same time. If Claude calls delegate_research three times in one response, you can process those three calls in parallel using threads.
from concurrent.futures import ThreadPoolExecutor
def run_tools(response) -> list:
blocks = [b for b in response.content if b.type == "tool_use"]
with ThreadPoolExecutor(max_workers=4) as pool:
results = list(pool.map(execute_tool, blocks))
return resultsWhat we said in Part 1 — “you must return the results of all parallel calls” — applies here as is. When three subagents each run a research task that takes a few minutes, the total time shrinks to that of the slowest one.
Keeping delegation from going too far #
Subagents are not free. Every call spins up an entire separate loop, which costs money and time. I recommend two rules.
- Do single-shot work directly. Spinning up a subagent for something one lookup can answer is waste. Writing a condition into the delegate tool’s description, like “only when multiple documents must be searched and read,” cuts down on excessive delegation.
- Limit delegation depth to one level. Once subagents start calling other subagents, cost and behavior become hard to trace. You can block this structurally by simply not giving subagents the delegate tool.
Common delegation pitfalls #
- Assuming the subagent shares context — a subagent knows nothing about the main conversation. What is written in the task is all it knows. Any needed background must go into the task.
- Delegating without a report format — a report without a format either misses the point or rambles. State the expected output in the task.
- Delegating everything — when delegation becomes the goal itself, even simple tasks burn a loop each. Write the criteria for “faster to do directly” into the description.
Key takeaways #
In this post we covered subagents, which split up the work.
- A subagent is another loop running on separate messages, and to the main agent it looks like just one more tool.
- The core value is context isolation. The subagent goes through the intermediate steps, and only the conclusion comes back.
- State the report format in the task like a contract, and manage cost by limiting delegation conditions and depth.
So far every tool has been a Python function inside our own code. In the next post, “AI Agent Development #6: Building Your Own MCP Server,” we separate tools into a server that speaks a standard protocol, so they can be reused from any agent.