AI Agent Development #7: Capstone Project — An Issue Triage Agent

6 min read

With Part 6, all the pieces of the series are in place. In this final post, let me tie those pieces together and finish one agent you can actually use: an issue triage agent that reads the GitHub issues piling up in a repository, classifies them, and proposes labels and a draft reply.

What we are building #

In open source or internal projects, issues pile up faster than they get handled. Triage originally comes from medicine: sorting incoming patients by urgency in an emergency room. In software, it means sorting incoming issues by type and priority. Concretely, that means reading a new issue, deciding whether it is a bug, a feature request, or a question, attaching labels, and posting a first reply when needed. It is repetitive yet requires judgment, which makes it a good fit for an agent.

The design follows the series’ principle: reads are free, writes go through approval. The agent fetches and analyzes issues on its own, but actually attaching a label or posting a comment only runs after a human approves it.

The tool set #

There are four tools. They use the GitHub REST API directly, so a single token is all you need.

triage_tools.py
import requests

API = "https://api.github.com"
HEADERS = {"Authorization": f"Bearer {GITHUB_TOKEN}"}
MAX_BODY = 3000   # Part 4: tool result diet

def list_open_issues(repo: str) -> str:
    """Return up to 10 unlabeled open issues, newest first."""
    r = requests.get(f"{API}/repos/{repo}/issues",
                     params={"state": "open", "per_page": 30}, headers=HEADERS)
    r.raise_for_status()
    unlabeled = [i for i in r.json() if not i["labels"] and "pull_request" not in i]
    lines = [f"#{i['number']} {i['title']}" for i in unlabeled[:10]]
    return "\n".join(lines) or "There are no unlabeled open issues."

def get_issue(repo: str, number: int) -> str:
    """Fetch the title and body of an issue by number. Always read the body before classifying."""
    r = requests.get(f"{API}/repos/{repo}/issues/{number}", headers=HEADERS)
    r.raise_for_status()
    issue = r.json()
    body = (issue["body"] or "")[:MAX_BODY]
    return f"Title: {issue['title']}\nBody:\n{body}"

def add_labels(repo: str, number: int, labels: list) -> str:
    """Attach labels to an issue. Reversible, but visible on a public repository, so it requires approval."""
    r = requests.post(f"{API}/repos/{repo}/issues/{number}/labels",
                      json={"labels": labels}, headers=HEADERS)
    r.raise_for_status()
    return f"Attached labels {labels} to #{number}."

def post_comment(repo: str, number: int, body: str) -> str:
    """Post a comment on an issue. A publicly visible action, so it always requires approval."""
    r = requests.post(f"{API}/repos/{repo}/issues/{number}/comments",
                      json={"body": body}, headers=HEADERS)
    r.raise_for_status()
    return f"Posted a comment on #{number}."

Write the tools schema definitions following the principles from Part 2. In particular, pin the labels parameter to the repository’s label scheme (bug, enhancement, question, documentation) with an enum. Leave it as a free string and labels that do not exist will appear.

The approval gate #

The two write tools go through human confirmation before they run. We reuse the gate from Part 2 as is. For a demo, terminal input is enough.

approval_gate.py
NEEDS_APPROVAL = {"add_labels", "post_comment"}

def execute_tool(block) -> dict:
    if block.name in NEEDS_APPROVAL:
        print(f"\n[Approval request] {block.name} {block.input}")
        if input("Run it? (y/n) ").strip().lower() != "y":
            return {
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": "The user did not approve. Summarize your proposals and report them instead.",
                "is_error": True,
            }
    try:
        return {"type": "tool_result", "tool_use_id": block.id,
                "content": run_tool(block.name, block.input)}
    except Exception as e:
        return {"type": "tool_result", "tool_use_id": block.id,
                "content": f"Tool execution failed: {e}", "is_error": True}

In a real service, a Slack notification or an approval button in a web UI replaces input(). The structure is the same. The agent waits for approval, and when denied, it reads that fact and changes its plan.

The system prompt #

Write the behavior rules as concrete actions, following the principles from Part 3.

triage_system.py
SYSTEM = """You are a GitHub issue triage agent.

Procedure:
1. Check unlabeled issues with list_open_issues and list your processing plan.
2. For each issue, read the body with get_issue before classifying. Never classify from the title alone.
3. Classification criteria: bug if it contains a reproducible error, enhancement if it
   proposes a new feature, question if it asks about usage, documentation if it suggests doc improvements.
4. Propose labels with add_labels. If you are not confident, do not attach a label and report why.
5. Only for question issues, draft a reply and propose it with post_comment.

Rules:
- Never retry an action once its approval has been denied.
- At the end, report the issues handled, labels attached, and issues held back in a table.
"""

For the loop, the run_agent from Part 1 works as is. One line, run_agent("Triage the new issues in the owner/repo repository."), and the agent starts running.

Evaluation — measuring classification quality with a golden set #

LLM App Development Part 12 covered evaluating answer quality. For agents the principle is the same, and the unit of evaluation changes to “task success.” In triage, the easiest thing to measure is classification accuracy. Use past issues that a human has already labeled as the golden set.

eval_triage.py
GOLDEN = [
    {"number": 101, "expected": ["bug"]},
    {"number": 95,  "expected": ["question"]},
    {"number": 88,  "expected": ["enhancement"]},
    # about 20 past issues are enough to detect changes
]

def evaluate(repo: str) -> float:
    correct = 0
    for case in GOLDEN:
        proposed = triage_one(repo, case["number"])   # a mode that only returns proposals, no approval gate
        if set(proposed) == set(case["expected"]):
            correct += 1
        else:
            print(f"#{case['number']}: expected {case['expected']} / proposed {proposed}")
    return correct / len(GOLDEN)

print(f"Classification accuracy: {evaluate('owner/repo'):.0%}")

With this number, whenever you change the classification criteria in the system prompt or swap the model, you can tell whether things got better or worse by metrics, not by feel. Treat it as a regression test you run every time you touch the prompt.

Going further from here #

  • Scheduled runs — run it once a day with cron and receive approval requests in Slack, and it becomes an operations tool.
  • Subagents — on high-volume days, the parallel delegation from Part 5 can run the per-issue analysis concurrently.
  • Turning it into an MCP server — split the four GitHub tools into a server as in Part 6, and clients other than the triage agent can use the same tools.

Closing the series #

Over seven posts we took agents from an introductory level to a practical one. Looking back:

  • A loop survives only when it handles every stop_reason and every tool error (Part 1).
  • An agent’s quality is decided by its tools’ descriptions, schemas, and error messages (Part 2).
  • The rule of planning first and verifying after changes is what creates self-correction (Part 3).
  • Long tasks hold up through tool result diets, pruning, compaction, and a scratchpad (Part 4).
  • When you need context isolation, split the work with subagents (Part 5).
  • Tools shared by multiple clients get separated into an MCP server (Part 6).
  • And actions that are hard to undo always get human approval (Part 7).

The center of gravity in agent development is not a flashy demo, but a loop that withstands failure, good tools, and measurable quality. I hope this series serves as that foundation. Thank you for reading along the way.

X