Do LLM agents hallucinate?

Yes — the underlying language model can still invent facts, fabricate tool arguments, or claim a step succeeded when it failed. Agents reduce this risk because tool calls return real observations the model must account for, but they don't eliminate it. Production agents add guardrails: schema-validated function calling, retrieval-augmented generation (RAG) to ground answers in source documents, self-critique or reflection passes, and human review on high-stakes actions.

Which LLMs are best for building agents?

Strong agentic models are ones trained for reliable tool calling, long-context reasoning, and instruction following — frontier models from Anthropic (Claude), OpenAI (GPT), and Google (Gemini) are common choices, alongside open-weight options like Llama and Qwen for self-hosting. The practical selection criteria are tool-calling accuracy, context window, latency, and cost per run. Many teams route easy steps to a cheaper, faster model and reserve a top-tier model for hard planning.

How do context window limits affect LLM agents?

Every agent turn re-sends the running transcript — system prompt, tool definitions, prior thoughts, and observations — so long tasks fill the context window fast and raise cost and latency. Agents manage this with context engineering: summarizing old steps, storing facts in external memory or a vector store and retrieving only what's relevant, trimming verbose tool output, and paginating large results instead of dumping them into the prompt.

What is the difference between an LLM and an LLM agent?

An LLM is a model that maps a prompt to text. An LLM agent wraps that model in a loop, a set of tools, and memory so it can take actions and pursue a multi-step goal. The LLM is the reasoning engine; the agent is the system that lets it act, observe results, and adapt until the task is done.

LLM Agents · Reasoning + Tools

LLM agents: reasoning meets tool use

An LLM agent is a large language model wrapped in a loop and given tools — so instead of just answering, it can think, act on the world, observe what happened, and keep going until the goal is met. Here's exactly how that works.

11 min read
Intermediate
Updated 2026

Start building agents How to build an agent

LLM agents turn a passive text generator into an active problem-solver. On its own, a large language model maps a prompt to a completion — it predicts the next tokens and stops. An agent adds three things on top: a set of tools the model can call, a loop that lets it take many steps, and memory so each step builds on the last. The simple formula: LLM + tools + loop = agent.

The breakthrough that makes this possible is reasoning and acting in the same trace. Rather than guessing an answer in one shot, the model reasons about what it needs, calls a tool to fetch real data, reads the result, and reasons again. This grounds the model in facts from the outside world — a database row, an API response, a search hit — instead of relying solely on what it memorized during training.

This guide is for students and engineers who want a precise mental model of how LLMs become agents. We'll walk the agent loop, the reasoning patterns (ReAct, plan-and-execute, reflection, tree-of-thought, chain-of-thought), what function calling and tools look like under the hood, and how a true agent differs from plain prompting. For the wider picture, start with what is agentic AI.

The engine

The think–act–observe loop

Every LLM agent runs the same cycle. The model thinks about the next step, acts via a tool, observes the result, and repeats until it can answer.

Think

Reason about the next step

Act

Emit a tool / function call

Observe

Read the tool's result

Repeat

Loop until done

One iteration of an LLM agent. The loop continues until the model emits a final answer or a stop condition is hit.

The loop is what separates an agent from a single model call. Each iteration, the agent runtime sends the model the running transcript and the list of available tools, then inspects the model's output:

Think — the model reasons in natural language about what it knows and what it still needs.
Act — it decides on a tool and emits a structured call with arguments (or declares it's finished).
Observe — the runtime executes the call and feeds the real result back into the transcript.
Repeat — armed with that observation, the model plans the next move, looping until the goal is reached.

Because reality is fed back every cycle, the agent can recover from a failed call, an empty search, or an unexpected value — re-planning rather than confidently returning a wrong answer.

How agents think

Reasoning patterns for LLM agents

The 'think' step isn't one thing. These patterns structure how an LLM agent reasons — from a single linear chain to branching, self-critique, and explicit planning.

ReAct (reason + act)

Interleaves Thought → Action → Observation on every turn. The model reasons, calls a tool, reads the result, and reasons again. The default pattern for tool-using agents because it grounds each step in real data.

Plan-and-execute

The agent drafts a full multi-step plan upfront, then executes each step (often with tool calls) and re-plans only if reality diverges. Fewer model calls and more predictable than step-by-step ReAct on long tasks.

Reflection / self-critique

After producing a draft or result, the agent critiques its own work against the goal, finds gaps or errors, and revises. A reviewer pass that measurably improves quality on coding, writing, and analysis tasks.

Tree-of-thought

Explores several reasoning branches in parallel, scores them, and keeps the most promising — like search over thoughts. Useful for puzzles and planning where the first idea is often not the best.

Chain-of-thought

The foundation: prompt the model to reason step by step before answering. It improves accuracy on multi-step problems and is the building block the other patterns extend with tools and branching.

Tool-calling loop

The runtime substrate beneath these patterns: native function calling where the model emits a structured call, the host executes it, and the result returns as a tool message the model continues from.

Which pattern when?

Reach for ReAct when steps depend on live data and you can't plan blindly. Prefer plan-and-execute for long, well-understood tasks where you want fewer model calls. Add a reflection pass whenever quality matters more than latency. These compose — see agentic workflows for full recipes.

See it in action

A ReAct-style trace, line by line

Here's what a real ReAct agent transcript looks like for a small research task. Notice how every Thought is followed by an Action, and every Action by a grounded Observation.

react-trace.logtrace

1User: What was Acme Corp's Q3 revenue, and is it up YoY?23Thought: I need Q3 revenue plus the same quarter last year.  // reasoning step4Action: sql_query("SELECT revenue FROM finance WHERE q='Q3-2026'")  // tool call5Observation: { revenue: 4_200_000 }  // real result67Thought: Got Q3-2026. Now I need Q3-2025 to compute YoY.  // reason again8Action: sql_query("SELECT revenue FROM finance WHERE q='Q3-2025'")9Observation: { revenue: 3_500_000 }1011Thought: 4.2M vs 3.5M is +20%. I can answer now.  // no more tools needed12Action: finish()  // terminate the loop13Answer: Acme's Q3 revenue was $4.2M, up 20% year over year.

A ReAct loop: Thought → Action → Observation, repeated until the model emits a final answer grounded in tool results.

The agent never guessed the numbers — it called a tool twice, read the real values, and only then did the arithmetic and answered. That grounding loop is precisely why react agents are more trustworthy than asking a model to recall figures from memory.

Tool use

Function calling: how the model acts

LLM tool use

From token prediction to a real API call

Modern LLMs support native function calling: you describe each tool as a name, a description, and a JSON schema of parameters. The model doesn't run the tool — it emits a structured call (the tool name plus arguments) that your runtime validates and executes.

The host returns the result as a tool message, and the model continues reasoning from that observation. This clean contract is what makes tool use reliable: arguments are schema-checked, results are explicit, and the model stays in control of when to call what.

Tools are declared as name + description + JSON schema.
The model emits a validated, structured call — not free text.
Your runtime executes it and returns the result as an observation.
Bad arguments are rejected and retried, not silently run.

Explore AI agent tools

ModelEmits a structured tool call

ValidateCheck args against JSON schema

Execute toolRun the API / query / code

ResultTool returns real data

ObservationModel reads it & continues

The function-calling round trip: the model requests a tool, the runtime executes it, and the result returns as an observation.

Tool calls

in the trace above

JSON schema

per declared tool

100%

Args validated

before execution

Guessed numbers

all grounded in data

Common confusion

Prompting vs an LLM agent

A clever prompt and an agent both use the same model — but only one can act, observe, and recover. Here's the practical difference.

Dimension	Plain prompting	LLM agent
Model calls	One shot	A loop of many steps
Can call tools / APIs
Grounds answers in live data
Reacts to results
Recovers from a failed step
Keeps state across steps
Best for	Single answers & drafts	Multi-step tasks & actions

Prompting is perfect when the answer fits in one response — summarize this, rewrite that, classify these. The moment a task needs fresh data, multiple steps, or the ability to act and then re-check, you want an agent. The agent contains the prompt: every loop iteration is still a model call, just wired into tools, memory, and a control structure. See how to build AI agents to put it into practice.

FAQ

LLM agents, answered

ReAct (Reasoning + Acting) is a prompting pattern that interleaves the model's chain-of-thought with concrete actions. On each turn the LLM emits a 'Thought' (its private reasoning), then an 'Action' (a tool call with arguments), and the runtime returns an 'Observation' (the tool's result). The model reads that observation and reasons again, repeating until it produces a final answer. ReAct grounds reasoning in real data, which sharply reduces hallucination compared with reasoning alone.

Keep learning

Go deeper on LLM agents

AI agent toolsFunction calling & tool design Agentic workflowsReAct, planning, reflection recipes How to build AI agentsA step-by-step guide AI agent frameworksPick the right stack AI agent memoryContext, state & vector stores All learning guidesThe full curriculum

LLM agentsreact agentsllm tool usefunction callingreasoning and actingchain-of-thoughtagentic workflowsAI agents

Get started

Build a tool-using LLM agent today

Wire reasoning, tools, and a loop together in minutes. Free to start — no credit card required.

Start building free Browse templates