AI Agent Development #2: Designing Good Tools

5 min read

In Part 1 we made the loop solid. But if two agents running the same loop perform very differently, the cause is usually the tools. Claude only sees the tool list we give it, so if the tools are vague, the agent behaves vaguely too. In this post we lay out the principles of tool design.

The description is documentation the model reads #

A tool’s description is not a comment. It is the only basis Claude has for deciding “should I use this tool now?” So writing only what it does is not enough; you also have to write when to use it.

tool_description.py
# Weak example — only says what it does
{
    "name": "search_orders",
    "description": "Search orders.",
    ...
}

# Good example — when to use it, what comes out, and the limits
{
    "name": "search_orders",
    "description": (
        "Search orders by order number, customer name, or date range. "
        "When the user asks about the status or details of a specific order, "
        "find the order with this tool first. "
        "Only orders from the last 90 days are searchable, and at most 20 are returned."
    ),
    ...
}

Recent models in particular tend to pick tools carefully, so stating trigger conditions like “when the user asks about X, call this tool first” raises tool usage noticeably.

Keep the schema narrow, describe every parameter #

The input_schema is not where you grant freedom; it is where you reduce it. If the accepted values are fixed, pin them down with enum, put only truly necessary parameters in required, and attach a description to every parameter.

tool_schema.py
{
    "name": "update_order_status",
    "description": "Change an order's status. Use only when a status change is requested.",
    "input_schema": {
        "type": "object",
        "properties": {
            "order_id": {
                "type": "string",
                "description": "Order number. e.g. ORD-2026-0001",
            },
            "status": {
                "type": "string",
                "enum": ["paid", "shipped", "delivered", "cancelled"],
                "description": "The status to set. No values exist other than these four.",
            },
        },
        "required": ["order_id", "status"],
    },
}

If you leave status as a free string, Claude will invent variants like “in transit”, “shipping”, or “SHIPPED”. Locking it down with enum makes this whole problem disappear. For the same reason, it is better to pin down formats in the description for dates, like “in YYYY-MM-DD format.”

Error messages are a design target too #

In Part 1 we decided to return tool errors as is_error results. The quality of that message determines the agent’s ability to recover. Claude reads the error message and decides its next action, so the message must contain enough information for Claude to decide what to do next.

tool_error_message.py
# Weak example — reading it gives no next action
return "An error occurred."

# Good example — what went wrong and how to fix it
return (
    "Order number 'ORD-9999' was not found. "
    "Try searching for the order first with the search_orders tool, "
    "using the customer name or a date."
)

The standard is the same as when writing error messages for humans. The difference is that an agent really does read the message and act on it. If you write how to fix the problem, it follows that path with high probability.

Classify your tools by risk #

Before adding a tool, it is worth asking yourself one question: what happens if this tool is called wrongly?

ClassExamplesHow to handle
Readsearch, lookupLet it call freely
Write (reversible)status change, draft saveValidate input, then run
Write (hard to reverse)payment, deletion, sending emailRequire human approval before running

For hard-to-reverse tools, you do not block the call itself; you insert a confirmation step right before execution. A branch in the tool execution function is enough.

dangerous_tool_gate.py
DANGEROUS_TOOLS = {"send_email", "delete_order", "refund_payment"}

def execute_tool(block) -> dict:
    if block.name in DANGEROUS_TOOLS:
        if not confirm_with_human(block.name, block.input):
            return {
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": "The user did not approve this action. Look for another way.",
                "is_error": True,
            }
    ...

Note that the rejection is also returned as an error result. Claude learns the fact that “this was not approved” and revises its plan. In the hands-on project in Part 7 we implement this approval flow for real.

Manage tool count and overlap #

The more tools there are, the harder Claude’s choice becomes. Overlapping descriptions are the biggest problem. If search_orders and find_order coexist, the choice between them wavers every time. There are two checks to apply.

  • Merge or delete overlapping tools. One tool that branches on a parameter beats two tools that do similar things.
  • Remove unused tools from the list. Every tool irrelevant to the current task is one more way to pick wrong. Building a different tool list per task type is also an option.

Even if each tool is good on its own, a confusing list overall leaves the agent lost. Treat the tool list like a menu: refine it as a whole.

Common tool-design pitfalls #

  • Writing the description from the developer’s viewpoint — a description like “internal order API wrapper” gives Claude no information. Write what this tool can tell you and when to use it, from the user’s perspective.
  • Making every parameter required — if optional parameters are forced to be required, Claude makes up values it does not know to fill them. Keep only what is truly necessary required.
  • Hiding errors — if a tool fails but returns an empty result, Claude proceeds with the wrong belief that “there were no results.” Recovery only starts when failure is reported as failure.

Key takeaways #

In this post we covered tool design, which determines an agent’s performance.

  • In the description, write when to use the tool in addition to what it does. Trigger conditions raise tool usage.
  • Narrow the schema with enum and required, and put information for deciding the next action into error messages.
  • Classify tools by risk, and insert human approval for hard-to-reverse tools.

In the next post, “AI Agent Development #3: Planning and Self-Correction,” we cover how a well-equipped agent plans multi-step work and fixes intermediate failures on its own.

X