Guardrails
Guardrails are the safety controls that constrain an AI agent's inputs, outputs, and actions — through validation, allow-lists, and approval gates — to keep its behavior safe and on-policy.
- Glossary
- Updated 2026
Guardrails are the safety controls that wrap an AI agent and constrain what it is allowed to take in, produce, and carry out. Because a language model is probabilistic — it can be talked into unsafe behavior or simply get something wrong — guardrails add a layer of deterministic enforcement around it. They typically operate at three points: on the way in, screening and sanitizing requests; on the way out, validating responses before they reach a user; and at the moment of action, deciding which tools and operations the agent may actually trigger.
They matter because an AI agent does not just talk — it acts. It can call APIs, send emails, move money, or change records, so an unchecked mistake is not a bad sentence but a real-world consequence. Guardrails turn vague intentions into enforced policy: schema validation rejects malformed output, allow-lists restrict the agent to an approved set of tools, classifiers catch toxic or off-topic content, and approval gates pause irreversible steps until a human signs off. They also blunt failure modes like prompt injection and hallucination, where the model confidently asserts something false — a fact-check against retrieved sources can block the claim before it ships.
Consider a finance agent asked to issue refunds. Input guardrails verify the request comes from an authenticated user and contains a valid order ID. Output guardrails confirm the refund amount matches the original charge and never exceeds it. An action guardrail then routes anything above a threshold to a human for approval, while small, routine refunds proceed automatically. Within a multi-step pipeline, orchestration decides what runs next, but guardrails decide what is allowed to run at all — so the agent stays useful without becoming dangerous.
Guardrails FAQ
Guardrails are the safety controls that constrain what an agent can take in, say, and do. They sit around the model rather than inside it: input checks filter or reject unsafe requests, output checks validate and sanitize what the agent produces, and action checks gate the tools and operations it is allowed to run. The goal is to keep the agent's behavior safe and on-policy even when the underlying model is unpredictable.
Ship agents that stay on-policy
Add validation, allow-lists, and approval gates so your agent acts safely by default. Free to start — no credit card required.