Retrieval-Augmented Generation
Connecting LLMs to your own data โ from basic pipelines to advanced retrieval architectures.
What is RAG and Why
LLMs know a lot, but they don't know your data. Retrieval-Augmented Generation is the pattern that fixes this: not by training the model on your data, but by finding the relevant pieces at query time and handing them directly to the model.
Embeddings and Vector Search
Semantic search, finding text by meaning rather than keywords, is the engine inside most RAG systems. Understanding how embeddings work and how vector databases store and query them is the foundation you need to build reliable retrieval.
Chunking and Indexing
You can't embed a whole document: you split it into pieces first. How you split determines what you can retrieve. The wrong chunking strategy is one of the most common reasons RAG systems fail to find the right answer even when the information clearly exists.
Retrieval Quality: Dense, Sparse, and Hybrid
Semantic search is powerful but not always the best retrieval method. Keyword search finds exact matches that embeddings miss. Re-ranking re-scores candidates with a slower but more accurate model. Understanding when to use each, and how to combine them, is what separates reliable RAG from fragile RAG.
Prompting for RAG
Retrieved chunks are only as useful as the instructions you give the model for using them. The grounding instruction, context format, citation pattern, and no-answer path are what turn a retrieval result into a reliable, trustworthy answer.
Evaluating RAG Systems
A fluent, well-formatted answer based on the wrong chunk is a failure, but it reads like a success. RAG evaluation requires two independent measurement tracks: retrieval quality and generation quality. Conflating them hides the real failure mode.
Advanced RAG Patterns
Basic RAG fails when queries are vague, answers span multiple documents, or context evolves across a conversation. Four patterns, multi-query retrieval, HyDE, contextual retrieval, and small-to-big, each fix a specific retrieval failure mode. Know which failure you have before reaching for a pattern.
Production RAG Checklist
A RAG prototype that works on your test documents is not a production system. This capstone synthesises the full RAG track into a checklist: the gaps that consistently cause RAG failures after launch, and the order to address them.
Context Failure Taxonomy
Four named failure modes account for the majority of context-related bugs in LLM systems: poisoning, distraction, confusion, and clash. Naming them is the first step to fixing them โ each requires a structurally different response.
Hybrid Memory Architecture
Most AI systems treat the context window as their only memory. This means every session starts cold and the system can never learn from past interactions. A proper memory hierarchy โ short-term, long-term, and working โ requires deliberate design decisions about what to remember, when to retrieve it, and when to forget it.
Multimodal RAG
Text-only RAG misses the majority of enterprise knowledge: diagrams, slide decks, scanned documents, recorded meetings, product images. Multimodal RAG extends retrieval to images and audio, but each modality requires different chunking, indexing, and context assembly strategies.
Long-Context vs RAG Decision Framework
Models with million-token context windows seem to make RAG obsolete. They don't. The decision between long-context, RAG, and hybrid depends on update frequency, query pattern, cost ceiling, and latency SLO โ not just how large your documents are.
Data Engineering for AI Systems
Most AI failures blamed on the model are actually data quality failures upstream. This module covers corpus lifecycle management, data contracts for AI pipelines, and the ingestion patterns that determine whether your RAG system retrieves signal or noise.