7 AI Agent Design Patterns Every Builder Should Know
Most agent code is a handful of recurring shapes wearing different framework costumes. Learn the seven that matter — what each one is, when it earns its keep, and how to stack them without turning your agent into a haunted house.
- 11 min read
- Engineering
- Updated 2026
You do not invent a new control flow every time you build an agent. You reach for one of a small set of patterns — the same way you reach for a queue, a cache, or a state machine in ordinary software.
The word agent hides a lot of variety. A one-shot tool call and a fleet of coordinating specialists are both called agents, but they share almost nothing structurally. What they do share is a handful of reusable shapes for arranging reasoning, tool calls, and collaboration. Learn those shapes and most agent codebases suddenly read like variations on a theme rather than a pile of bespoke spaghetti.
This post walks through seven patterns every builder should keep in their head: ReAct, plan-and-execute, reflection, routing, orchestrator-worker, evaluator-optimizer, and the tool-use loop. For each, you get a plain answer to two questions — what is it and when do you use it — and we close with the part nobody writes down: how to compose them without drowning in model calls. If you want the foundational mechanics first, the LLM agents guide and the agentic workflows guide are the right warm-up.
The seven patterns at a glance
Skim the whole toolbox first. Each card is a pattern, its one-line job, and the situation it was made for — the rest of the post zooms into each.
1 · ReAct
Interleave reasoning and acting: think a step, take a tool action, read the result, think again. The default loop for open-ended tasks that need live information.
2 · Plan-and-execute
Draft a full multi-step plan first, then carry it out step by step. Best when the task has a long horizon and wandering reasoning gets lost.
3 · Reflection
Generate, then critique your own output and revise. Earns its keep when quality matters and mistakes are catchable by re-reading the draft.
4 · Routing
Classify the request, then dispatch it to exactly one specialized handler. A cheap triage step that keeps each lane focused and prompts short.
5 · Orchestrator-worker
A coordinator decomposes a goal, fans sub-tasks out to specialist workers, and synthesizes their results. For work that splits into parallel parts.
6 · Evaluator-optimizer
One model produces, another scores against criteria, and the loop optimizes until the bar is met. A measurable quality gate around any generator.
7 · Tool-use loop
The function-calling primitive under all of the above: the model emits a tool call, the runtime executes it, the result returns, repeat until done.
ReAct — reason and act, in lockstep
The pattern that made agents feel like agents: don't plan everything up front, just think one step, act, observe, and let the next thought be informed by what actually happened.
What it is. ReAct (short for reason + act) interleaves chains of thought with tool actions in a single loop. The model writes a short reasoning trace, picks one action, executes it, reads the observation, and folds that evidence into its next thought. It does not commit to a full plan up front — it steers in real time based on what the world returns.
When to use it. Reach for ReAct on open-ended tasks where the next move depends on information you don't have yet: question answering over tools, web research, debugging, anything where a fixed plan would be guesswork. It is the most common default for an LLM agent because it degrades gracefully — even a two-tool agent benefits from thinking between calls. The trade-off is drift: long ReAct loops can wander, repeat themselves, or lose the thread, which is exactly the gap the next two patterns fill.
Plan-and-execute — decide the route before you drive
When a task is too long to improvise, separate the thinking from the doing: write the whole plan once, then execute each step with a cheaper, more focused worker.
What it is. A planner model breaks the goal into an explicit, ordered list of steps. An executor then runs those steps one by one — often re-planning when reality diverges from the plan. The key move is the separation: a strong model does the expensive reasoning once, and lighter calls handle the mechanical execution.
When to use it. Plan-and-execute shines on long-horizon tasks where pure ReAct loses its way — multi-stage data pipelines, "research this topic and write a report," anything with five-plus dependent steps. An upfront plan is auditable (you can read it before anything runs), parallelizable (independent steps can fan out), and cheaper at execution time. The cost is rigidity: a plan written before the first observation can be wrong, so good implementations let the planner revise mid-flight rather than marching off a cliff.
1 · Plan
A capable model turns the goal into an ordered list of concrete, checkable steps with their dependencies made explicit.
2 · Execute
Each step runs in turn — often as its own small ReAct or tool-use loop — producing an intermediate result the next step consumes.
3 · Re-plan
When a step's result contradicts the plan, the planner revises the remaining steps instead of blindly continuing.
Reflection — let the agent grade its own work
The cheapest reliable quality boost: have the model read its first draft, critique it against the goal, and produce a better second version.
A second pass beats a smarter prompt
Reflection (also called self-critique) splits work into a producer turn and a critic turn. The model writes a first answer, then a second prompt asks it to find flaws against the requirements — missing cases, broken logic, rubric violations — and rewrite accordingly. The critique can come from the same model wearing a different hat, or from a dedicated critic with its own instructions.
Use it when quality outranks latency and errors are visible on re-reading: code that must compile and pass tests, structured output that must match a schema, prose that must hit a checklist. Skip it for trivial lookups where the first answer is essentially always right — every reflection pass is at least one extra model call, so you trade cost and latency for correctness.
- Catches mistakes the first pass confidently shipped.
- Works best with concrete criteria, not vague 'make it better'.
- Pairs naturally with tools — let the critic run the tests.
- Cap the loop: two or three revisions, then stop.
Routing — classify first, then dispatch
Not every request belongs in the same prompt. A lightweight classifier sends each input to the one handler built for it, keeping every lane sharp and short.
What it is. Routing puts a classification step at the front door. A small, fast model (or even a rules layer) reads the input, decides which category it belongs to, and dispatches it to a single downstream handler tuned for that category. Each handler gets a tighter prompt, a smaller toolset, and fewer ways to go wrong than one do-everything mega-agent.
When to use it. Routing fits anytime your traffic is genuinely heterogeneous — a support bot fielding billing, technical, and account questions; a coding agent splitting "explain this" from "edit this"; a model picker sending easy queries to a cheap model and hard ones to a strong one. It cuts cost and raises accuracy because no single prompt has to be good at everything. The risk is misroutes, so keep an "unsure" fallback and log the classifier's decisions. Routing is one-in, one-out — when a request needs many handlers working together, you want the next pattern.
Orchestrator-worker — decompose, delegate, recombine
When a goal naturally splits into parts, a coordinating agent breaks it down, hands each piece to a specialist worker, and stitches the results back into one answer.
Orchestrator
Decomposes & synthesizes
Researcher
Gathers sources
Coder
Writes the changes
Tester
Verifies output
Writer
Drafts the summary
What it is. A lead agent — the orchestrator — receives the goal, decomposes it into sub-tasks, and dispatches each to a worker agent with its own role, prompt, and tools. Workers can run in parallel and don't need to know about one another. When they finish, the orchestrator collects and synthesizes their outputs into a single coherent result.
When to use it. This is the workhorse of multi-agent systems: tasks that decompose into independent, specializable parts — research across many sources at once, a codebase change spanning several files, a report whose sections can be drafted concurrently. The payoff is parallelism and focus; the cost is coordination overhead, more tokens, and the hard problem of recombination. For the deeper mechanics of wiring orchestrators to workers, see AI agent orchestration, and for the call on whether you even need more than one agent, read single-agent vs multi-agent.
Evaluator-optimizer — a scoreboard in the loop
Reflection's rigorous cousin: pair a generator with a separate evaluator that scores against explicit criteria, and keep optimizing until the output clears the bar.
What it is. Two roles run in a tight loop. A generator produces a candidate; an evaluator scores it against defined criteria and returns specific, actionable feedback; the generator tries again using that feedback. The loop continues until the evaluator passes the output or you hit an iteration limit. The difference from plain reflection is rigor — the evaluator is a distinct role with measurable criteria, not just the producer second-guessing itself.
When to use it. Use evaluator-optimizer when "good" has a clear, checkable definition: translations graded against a rubric, generated code measured by passing tests, search results scored for relevance, copy held to brand and length rules. It works precisely because the evaluator can give pointed feedback — "this case is unhandled," "tone is too formal" — that the optimizer can act on. If you cannot articulate the criteria, you cannot build the evaluator, and you are better off with simple reflection or a human in the loop.
The tool-use loop — the primitive under everything
Strip every pattern above down and the same engine is humming underneath: the model asks for a tool, the runtime runs it, the result comes back, and the loop turns again.
What it is. The tool-use (or function-calling) loop is the lowest-level pattern: you describe a set of tools with their inputs, the model decides when to call one and emits a structured request, your runtime executes it and returns the result, and the model continues with that result in context. It repeats until the model produces a final answer instead of another call. Everything else in this post is a policy layered on top of this loop — ReAct adds explicit reasoning between calls, planning sequences them, orchestration distributes them.
When to use it. Always, in some form — it is less a choice than the substrate. You reach for the bare loop when the task is well-defined and bounded: look something up, transform it, write it back. The craft is in the tools, not the loop: clear names, tight schemas, helpful error messages, and guardrails on anything that writes or spends. Get the workflow plumbing right and most agents are a good tool-use loop plus one or two patterns from above. Watch the usual failure modes: infinite loops, runaway cost, and tools that fail silently.
Composing patterns without the haunted house
The patterns are layers, not rivals. The trick is adding only the layers a real failure demands — and keeping each one thin enough to debug.
No serious system uses exactly one pattern. A mature agent might route an incoming request to the right lane, run an orchestrator that plans and decomposes the work, give each worker its own ReAct tool-use loop, and wrap the final output in an evaluator-optimizer pass before returning it. Five patterns, one pipeline — and each is doing a job the others can't.
The order is the insight. Routing chooses, planning sequences, orchestration parallelizes, ReAct and the tool-use loop execute each step, and reflection or evaluation guards the result. Stack them in that grain and the system reads top-to-bottom; fight the grain and you get the haunted house — agents calling agents calling agents with no one able to say why a given answer came out the way it did.
So compose deliberately. Begin with the thinnest thing that could work — usually a tool-use loop or plain ReAct — ship it, and add a layer only when you can name the failure it fixes. "Tasks longer than eight steps lose the thread" earns you planning. "Output quality is inconsistent" earns you reflection or an evaluator. "One prompt can't serve three audiences" earns you routing. Every layer you add is more latency, more tokens, and more surface area for bugs, so each one should pay for itself.
The composition rule of thumb
Add patterns in this order as needs appear: tool-use loop → ReAct → reflection → routing → planning → orchestrator-worker → evaluator-optimizer. Stop at the first level that meets your quality bar. Most agents never need the last two — and the ones that do should reach them because a metric forced the move, not because multi-agent diagrams look impressive.
Agent design patterns, answered
AI agent design patterns are reusable shapes for organizing how a language model reasons, calls tools, and coordinates with other agents to finish a task. They are the agent equivalent of software design patterns: named, battle-tested structures like ReAct, plan-and-execute, reflection, routing, and orchestrator-worker that you reach for instead of reinventing control flow each time. Picking the right pattern is mostly about how predictable the task is, how many steps it takes, and how much you can afford to spend on extra model calls.
Go deeper on building real agents
Build agents on patterns that hold up
Start from templates that bake in ReAct, routing, and orchestration — then compose your way up only as far as the task demands. Free to start, no credit card required.