AI Agent Development #1: Building a Robust Agent Loop

Thursday, June 11, 2026

5 min read

In LLM App Development Part 10 we built our first agent. Given several tools and a goal, Claude decided the order on its own and completed the job in a loop. That loop works, but it has gaps that make it unfit for a real service. In this series we close those gaps one by one and build an agent that does not fall apart even on long tasks.

This series has seven parts: hardening the loop (Part 1), tool design (Part 2), planning and self-correction (Part 3), context management (Part 4), subagents (Part 5), and building an MCP server (Part 6), then an issue-triage agent in Part 7. For the model we use claude-opus-4-8, our strongest model for agent-style work that requires independent multi-step judgment.

Gaps in the minimal loop #

Looking back at the Part 10 loop, it assumes the response’s stop_reason is either end_turn or a tool call.

minimal_loop.py

if response.stop_reason == "end_turn":
    return next(b.text for b in response.content if b.type == "text")
# otherwise run the tools and continue

In practice there are more cases. The response can hit max_tokens and get cut off mid-way, or come back as a refusal for safety reasons. And if an exception fires inside a tool function, the whole loop dies. An agent is not code that runs once; it is code that repeats dozens of times, so a 1% accident per iteration becomes a frequent event over the whole loop.

Handling every stop_reason #

Here are the stop_reason values the loop can encounter.

stop_reason	Meaning	What the loop should do
`tool_use`	Wants to call a tool	Run the tool, return the result, continue
`end_turn`	Finished the answer	Return the final text and stop
`max_tokens`	Cut off at the output limit	Retry with a higher limit or treat as an error
`refusal`	Refused for safety reasons	Do not repeat the same request; stop

Turning the branches into code looks like this.

handle_stop_reason.py

if response.stop_reason == "end_turn":
    return final_text(response)

if response.stop_reason == "max_tokens":
    raise RuntimeError("Response was truncated. Increase max_tokens.")

if response.stop_reason == "refusal":
    return "The request cannot be processed."

# What remains is tool_use. Run the tools and continue the loop.

There is a reason to treat max_tokens as an error. If you keep a truncated response in the conversation and keep looping, Claude picks up its own cut-off words and drifts further and further off course. Knowing the response was truncated is better than blindly continuing from it.

Returning tool errors as results #

By default, an exception in a tool function kills the loop. The better approach is to catch the exception and return the error as a tool result instead. Mark the tool_result with is_error, and Claude reads the error and tries another approach.

tool_error_as_result.py

def execute_tool(block) -> dict:
    try:
        result = run_tool(block.name, block.input)
        return {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result,
        }
    except Exception as e:
        return {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": f"Tool execution failed: {e}",
            "is_error": True,
        }

For example, when a file-reading tool returns a “file not found” error result, Claude recovers on its own — say, by rechecking the path with a listing tool. None of that would have happened if the exception had killed the loop.

Transient API errors: the SDK retries for you #

The longer the loop runs, the higher the odds of hitting a network error or transient overload (429, 5xx) along the way. You do not have to implement this yourself. The Anthropic SDK retries with exponential backoff up to 2 times by default, and you adjust the count with max_retries.

client_retries.py

client = anthropic.Anthropic(max_retries=4)

The finished loop #

Here is the loop with everything so far combined. It also logs which tool was called with which input. When the agent misbehaves, this log is practically the only clue for finding the cause.

robust_agent_loop.py

import logging
import anthropic

logger = logging.getLogger("agent")
client = anthropic.Anthropic(max_retries=4)

def run_agent(goal: str, max_steps: int = 20) -> str:
    messages = [{"role": "user", "content": goal}]

    for step in range(max_steps):
        response = client.messages.create(
            model="claude-opus-4-8",
            max_tokens=16000,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if b.type == "text")
        if response.stop_reason == "max_tokens":
            raise RuntimeError("Response was truncated at max_tokens.")
        if response.stop_reason == "refusal":
            return "The request cannot be processed."

        messages.append({"role": "assistant", "content": response.content})

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                logger.info("step=%d tool=%s input=%s", step, block.name, block.input)
                tool_results.append(execute_tool(block))
        messages.append({"role": "user", "content": tool_results})

    raise RuntimeError(f"Could not finish the task within {max_steps} steps.")

Compared with Part 10, this loop handles all four stop_reason values, tool exceptions come back as is_error results instead of killing the loop, and exceeding the step limit raises an error instead of passing silently. The structure is the same, but now the loop can tell you “why it stopped” in any situation.

One more thing: Claude can call multiple tools at once in a single response. As in the code above, you must iterate over every tool_use block in response.content and return one tool_result per block in a single message. If even one is missing, the API rejects the next request.

Common loop pitfalls #

Continuing a truncated response as is — keep a max_tokens-truncated response in the conversation and keep looping, and the output degrades step by step. Treat truncation as an error and raise the limit.
Mismatched tool_result counts — in a parallel call, returning results for only some tools triggers a 400 error saying there is no result matching a tool_use_id. Fill in failed tools with is_error results so the counts match.
Running without logs — when the agent reaches a wrong conclusion and there are no logs, there is no way to tell at which step it went off track. Record at least the tool name and input.

Key takeaways #

In this post we took the minimal agent loop up to production level.

stop_reason branches cover not just tool_use and end_turn but also max_tokens and refusal.
Tool exceptions come back as is_error results instead of killing the loop, so Claude recovers on its own.
Transient API errors are left to the SDK’s max_retries, and tool calls are logged.

With the loop hardened, it is time for the tools. In the next post, “AI Agent Development #2: Designing Good Tools,” we cover tool design — the part that completely changes an agent’s performance even with the same loop.