Why does RAG reduce hallucination?

Because the model is handed real, source-grounded passages at answer time, it has less reason to invent facts. The retrieved text acts as evidence the model can quote and cite, so answers stay anchored to your actual data instead of the model's fuzzy training recall. It reduces hallucination but does not fully eliminate it.

Is RAG the same as fine-tuning?

No. Fine-tuning bakes new behavior into the model's weights through training, while RAG leaves the model unchanged and instead supplies fresh knowledge through the prompt at query time. RAG is cheaper to update — you just re-index documents — and is the usual choice when your data changes often or must be cited.

Glossary

Retrieval-augmented generation (RAG)

RAG is a technique that retrieves relevant external documents and adds them to the prompt, so a language model answers from real, current data instead of memory alone.

Glossary
Updated 2026

Start building free Full RAG guide

Retrieval-augmented generation (RAG) is a pattern for grounding a language model: before the model writes an answer, a retrieval step pulls the most relevant passages from an external knowledge store and inserts them into the prompt as context. The model then generates its response from that supplied evidence — your docs, tickets, or product data — rather than relying purely on what it absorbed during training.

Mechanically, the user's question is turned into embeddings and matched against an index — usually a vector database — to find the closest passages by meaning. The top matches are concatenated into the prompt with an instruction like "answer using only the context below." Because the model is reading real source text at answer time, RAG sharply cuts hallucination and lets the system cite where each claim came from. It also keeps answers current without retraining: update the index and the next query reflects it.

A concrete example: a support assistant is asked "What is our refund window for enterprise plans?" Instead of guessing, the system retrieves the two policy paragraphs that mention enterprise refunds, drops them into the prompt, and the model replies "30 days, per the Enterprise Terms," quoting the retrieved clause. Swap the policy doc tomorrow and the answer updates automatically — no model change required.

Related terms

Concepts that power RAG

Embeddings: Numeric vectors that capture meaning, used to match a query to similar passages. See embeddings →
Vector database: The store that indexes embeddings and returns nearest matches fast — the retrieval engine in RAG. See vector database →
Hallucination: Confident but false model output; the core failure mode RAG is designed to reduce. See hallucination →

FAQ

RAG, briefly answered

RAG stands for retrieval-augmented generation. It is a technique where a system first retrieves relevant documents from an external knowledge source, then passes them to a language model so the model generates its answer using that fetched evidence rather than memory alone.

Learn more

Keep reading

RAG: the full guideArchitecture, chunking, and retrieval quality Vector databaseWhere embeddings are indexed and searched EmbeddingsHow meaning becomes a vector

Get started

Ground your agents with RAG

Connect your knowledge base and let agents answer from real, current data. Free to start.

Start building free Read the RAG guide