Thinking in agents: from single prompts to multi-step workflows
When a single LLM call is not enough, you reach for an agent. Here is the mental model I use to design them without overengineering.
The word "agent" carries a lot of freight right now. In some circles it means a general autonomous system that browses the web and books your flights. In my day-to-day work it means something much smaller: a loop with tools and an exit condition.
Both are called agents. Only the smaller one ships.
#The minimum agent
An agent is a model that can call functions and decide when it is done. That is the whole idea.
async function run(goal: string, tools: Tool[]): Promise<string> {
const messages: Message[] = [{ role: 'user', content: goal }];
while (true) {
const response = await claude.messages.create({ messages, tools });
if (response.stop_reason === 'end_turn') {
return lastTextBlock(response);
}
const toolResults = await executeTools(response.content);
messages.push(
{ role: 'assistant', content: response.content },
{ role: 'user', content: toolResults },
);
}
}No framework. No abstraction. The loop is fifteen lines. Start here.
#When to reach for an agent
The question I ask: does the task require decisions that depend on previous decisions in the same session?
A single prompt handles stateless transformation: summarize this text, classify this ticket, draft a reply. The moment you need "look up X, then decide whether to also look up Y based on what you found," you want an agent.
#Tools are the design
The quality of an agent is almost entirely determined by its tool design, not its prompt. A well-designed tool set constrains what the agent can do, makes errors debuggable, and makes the agent faster by giving it precise primitives rather than asking it to guess.
Bad tool:
{ name: "do_database_thing", description: "does something with the database" }Good tool:
{
name: "get_order_by_id",
description: "Returns a single order by its numeric ID. Returns null if not found.",
input_schema: {
type: "object",
properties: { id: { type: "number", description: "The order ID" } },
required: ["id"]
}
}The model reads the description. Make it accurate enough to serve as documentation.
#The things that actually fail
Tool output that is too large. The model tries to return a full database table. The context fills. Things go wrong. Every tool should return the minimum data needed to continue.
Unbounded loops. A well-prompted agent can still spin. Add a hard max-turn limit. Log when you hit it. In my experience, hitting max-turns is almost always a tool design problem, not a model problem.
Missing an exit condition. The agent does not know it is done. Prompt it explicitly:
const SYSTEM = `
You have access to a set of tools. Use them to gather information.
When you have enough information to answer the question fully,
return your answer as plain text and stop calling tools.
Do not call more tools than necessary.
`;#Single-agent vs multi-agent
I start single-agent almost every time. Multi-agent is the right call when:
- The subtasks are truly independent and you want to run them in parallel
- A subtask has a different risk profile and needs a separate model with different permissions
- The context of the orchestrator would overflow if it handled everything itself
Otherwise, the coordination overhead is not worth it. One model, one loop, one set of tools — ship that first.
The most reliable agent I have built had four tools and a twenty-line prompt. The most unreliable one had twenty tools and no clear exit condition.
Agents are not magic. They are loops with memory and function calling. Design the tools carefully, bound the loops, eval the output. Everything else is marketing.