Retrieval-augmented generation (RAG)
RAG is a technique that retrieves relevant external documents and adds them to the prompt, so a language model answers from real, current data instead of memory alone.
- Glossary
- Updated 2026
Retrieval-augmented generation (RAG) is a pattern for grounding a language model: before the model writes an answer, a retrieval step pulls the most relevant passages from an external knowledge store and inserts them into the prompt as context. The model then generates its response from that supplied evidence — your docs, tickets, or product data — rather than relying purely on what it absorbed during training.
Mechanically, the user's question is turned into embeddings and matched against an index — usually a vector database — to find the closest passages by meaning. The top matches are concatenated into the prompt with an instruction like "answer using only the context below." Because the model is reading real source text at answer time, RAG sharply cuts hallucination and lets the system cite where each claim came from. It also keeps answers current without retraining: update the index and the next query reflects it.
A concrete example: a support assistant is asked "What is our refund window for enterprise plans?" Instead of guessing, the system retrieves the two policy paragraphs that mention enterprise refunds, drops them into the prompt, and the model replies "30 days, per the Enterprise Terms," quoting the retrieved clause. Swap the policy doc tomorrow and the answer updates automatically — no model change required.
RAG, briefly answered
RAG stands for retrieval-augmented generation. It is a technique where a system first retrieves relevant documents from an external knowledge source, then passes them to a language model so the model generates its answer using that fetched evidence rather than memory alone.
Ground your agents with RAG
Connect your knowledge base and let agents answer from real, current data. Free to start.