Layer 1: Surface
A hallucination is when a model generates confident, wrong output.
The word is misleading: the model is not confused, it is not guessing, and it is not lying. It is doing exactly what it was trained to do: produce the most plausible continuation of the text it received. Sometimes the most plausible continuation happens to be false.
This is not a bug that will be patched away. It is a consequence of how LLMs work. They are trained to generate text that looks like text in their training data: not to retrieve verified facts from a database. A model that has never seen a paper cited correctly in training may invent a plausible-sounding citation. A model asked to calculate compound interest may produce a number that looks like the right answer but isnât.
The four most common forms:
| Type | Example |
|---|---|
| Factual | Stating an incorrect date, statistic, or personâs role |
| Citation | Inventing a paper title, author, or URL that does not exist |
| Reasoning | Reaching the wrong conclusion through apparently logical steps |
| Identity | Claiming capabilities or knowledge the model doesnât have |
Why it matters
Hallucinations are silent. There is no exception. No warning. No confidence score in the default response. The wrong answer looks identical to the right answer. This makes hallucinations uniquely dangerous in production: a system that errors loudly is easy to fix; a system that silently returns wrong information erodes user trust and can cause real harm before anyone notices.
Production Gotcha
Common Gotcha: Hallucinations produce no error, no warning, and no signal: just confident, wrong output that looks exactly like correct output. Never pass raw LLM output into downstream systems (databases, emails, APIs) without validation. The absence of an exception is not evidence of correctness.
Layer 2: Guided
Why hallucinations happen
LLMs are trained to predict the next token given all previous tokens. During training, the loss function rewards generating tokens that match the training data: not tokens that are factually verifiable. The model learns which facts appear frequently and in what context, but it has no mechanism to distinguish âI was trained on this factâ from âI am generating a plausible-sounding fact.â
This means hallucination risk is higher when:
- The topic is obscure or underrepresented in training data
- The correct answer is numerical or requires precise recall
- The question is a hybrid (e.g. âWhat did person X say in paper Y?â)
- The model is pushed into a domain outside its training distribution
Mitigation strategies
1. Retrieval-Augmented Generation (RAG)
Instead of asking the model to recall facts from training, retrieve relevant documents and pass them as context. The modelâs job becomes reading and synthesising provided text: much more reliable than recall.
# --- pseudocode ---
def answer_with_context(question: str, retrieved_docs: list[str]) -> str:
context = "\n\n---\n\n".join(retrieved_docs)
response = llm.chat(
model="balanced",
system=(
"Answer the question using only the provided documents. "
"If the answer is not in the documents, say 'I don't have enough information to answer this.' "
"Do not use knowledge from outside the provided documents."
),
messages=[{"role": "user", "content": f"Documents:\n\n{context}\n\nQuestion: {question}"}],
max_tokens=1024,
)
return response.text
# In practice â Anthropic SDK
import anthropic
client = anthropic.Anthropic()
def answer_with_context(question: str, retrieved_docs: list[str]) -> str:
context = "\n\n---\n\n".join(retrieved_docs)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=(
"Answer the question using only the provided documents. "
"If the answer is not in the documents, say 'I don't have enough information to answer this.' "
"Do not use knowledge from outside the provided documents."
),
messages=[{"role": "user", "content": f"Documents:\n\n{context}\n\nQuestion: {question}"}],
)
return response.content[0].text
# OpenAI: response.choices[0].message.content | Gemini: response.text
The instruction âif the answer is not in the documents, say soâ is critical. Without it, the model may blend retrieved content with recalled facts and hallucinate seamlessly.
2. Structured output with schema validation
Constrain the output shape and validate it. A schema cannot prevent factual hallucinations, but it catches structural ones (wrong field types, missing required fields, out-of-range values):
# --- pseudocode ---
import json
def extract_event(text: str) -> dict:
response = llm.chat(
model="balanced",
system=(
"Extract the event details and return valid JSON only. "
"Use null for fields that are not present in the text.\n\n"
"Schema: {\"name\": string, \"date\": string (ISO 8601) or null, "
"\"location\": string or null}"
),
messages=[{"role": "user", "content": text}],
max_tokens=256,
)
raw = response.text.strip()
event = json.loads(raw) # raises if not valid JSON
# Schema validation â catches wrong types, missing fields
assert isinstance(event.get("name"), str), "name must be a string"
assert event.get("date") is None or isinstance(event["date"], str)
assert event.get("location") is None or isinstance(event["location"], str)
return event
# In practice â Anthropic SDK
import anthropic
import json
client = anthropic.Anthropic()
def extract_event(text: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
system=(
"Extract the event details and return valid JSON only. "
"Use null for fields that are not present in the text.\n\n"
"Schema: {\"name\": string, \"date\": string (ISO 8601) or null, "
"\"location\": string or null}"
),
messages=[{"role": "user", "content": text}]
)
raw = response.content[0].text.strip()
# OpenAI: response.choices[0].message.content | Gemini: response.text
event = json.loads(raw) # raises if not valid JSON
assert isinstance(event.get("name"), str), "name must be a string"
assert event.get("date") is None or isinstance(event["date"], str)
assert event.get("location") is None or isinstance(event["location"], str)
return event
For production use, replace assert with a proper schema library (e.g. pydantic) and handle validation errors gracefully.
3. Ask for citations and verify them
When factual accuracy matters, instruct the model to cite its sources, then verify them:
system = (
"When you state a fact, cite the source inline as [Source: <title or URL>]. "
"If you cannot cite a source, say 'I am not certain about this' before the claim."
)
This does not guarantee correctness, models can hallucinate citations, but it makes claims auditable and signals to users that verification is expected. In a RAG system, constrain citations to the retrieved document set.
Before vs After
Trusting recall: high hallucination risk:
# BAD: Asking the model to recall specific facts with no grounding
response = llm.chat(
model="balanced",
messages=[{
"role": "user",
"content": "What were the key financial metrics in Acme Corp's Q3 2025 earnings report?"
}],
max_tokens=512,
)
# The model has no access to this report. It will fabricate plausible-sounding numbers.
Grounded in retrieved context: lower hallucination risk:
# GOOD: Retrieve the actual report; model reads and extracts
earnings_report = retrieve_document("acme_q3_2025_earnings.pdf")
response = llm.chat(
model="balanced",
system="Answer using only the provided document. Quote directly where possible.",
messages=[{
"role": "user",
"content": f"Document:\n{earnings_report}\n\nWhat were the key financial metrics?"
}],
max_tokens=512,
)
Common mistakes
- Treating LLM output as a source of truth: Using model output to populate databases, generate reports, or drive decisions without human review or validation.
- No âI donât knowâ path: Not explicitly instructing the model to say when it lacks information. Without this, models fill gaps with hallucinations.
- Validating structure but not facts: JSON schema validation tells you the output is well-formed, not that the facts inside are correct.
- Single-pass extraction on high-stakes data: For important extractions, run the same prompt twice and compare. Consistent outputs are more reliable; divergent outputs flag uncertainty.
- Conflating low temperature with factual accuracy: Low temperature makes outputs more deterministic, not more factually correct. The same wrong answer, every time.
Layer 3: Deep Dive
The calibration problem
A well-calibrated model would assign high confidence to correct claims and low confidence to uncertain ones. Current LLMs are poorly calibrated in this sense: they generate equally fluent text whether they are certain or guessing. The surface features that signal confidence in human writing (precise language, specific details, citations) are learned stylistic patterns, not indicators of underlying knowledge.
This is why hallucinations are particularly hard to detect: the modelâs most confident-sounding outputs are not reliably its most accurate ones. TruthfulQA found that larger models were generally less truthful on its benchmark: scaling alone does not fix hallucination.
Sycophancy
Sycophancy is a closely related failure mode: models that agree with incorrect user claims when pushed. If a user says âActually, I think the answer is Xâ and X is wrong, the model will often validate X rather than maintain the correct answer. This is a product of RLHF training: feedback providers reward responses that feel satisfying, and agreement tends to feel satisfying.
Production implications:
- Do not use LLMs to validate facts users provide
- If building a review/checking feature, instruct the model explicitly: âDo not change your answer based on what the user claims unless they provide new evidenceâ
- Test for sycophancy in your evaluation set: include cases where the user asserts a wrong answer and check whether the model capitulates
Factual consistency in long conversations
As conversations grow longer, models are less consistent: they may contradict earlier claims, forget constraints established early in the conversation, or hallucinate details that were not in the original context. This degrades in proportion to how much of the context window is consumed by earlier turns.
For applications requiring factual consistency across a session (e.g. document analysis, multi-step research), periodically summarise the established facts and re-inject them as a structured context block rather than relying on the model to recall them from deep in the conversation history.
Retrieval-Augmented Generation at scale
RAG is the primary production mitigation for factual hallucinations. The architecture separates two concerns:
- Retrieval: find documents relevant to the query (vector search, keyword search, or hybrid)
- Generation: given the retrieved documents, extract or synthesise the answer
The modelâs reliability depends heavily on retrieval quality. If the retrieval step returns irrelevant documents, the model will either say it doesnât know (good) or synthesise an answer from the irrelevant content (bad). Evaluate retrieval recall and precision independently from generation quality.
Common RAG failure modes:
- Retrieval misses the relevant document: model lacks the context it needs; may hallucinate to fill the gap
- Retrieved context is too long: model attends unevenly; details in the middle of long contexts are missed more often
- Model blends retrieved and recalled facts: especially common when retrieved context partially answers the question
Hallucination detection
Building reliable hallucination detection is an open research problem, but practical approaches exist:
- Self-consistency: run the same prompt N times; answers that appear consistently are more likely correct
- Entailment checking: pass the modelâs answer and the source document to a second model and ask whether the answer is supported by the document
- Fact-checking prompts: âDoes the following answer contain any claims not supported by the provided document? List any unsupported claims.â
- Human-in-the-loop for high-stakes outputs: for decisions with material consequences, require human review before acting on model output
Further reading
- TruthfulQA: Measuring How Models Mimic Human Falsehoods, Lin et al., 2021. Benchmark showing larger models were generally less truthful on its question set, scaling alone does not reduce hallucination.
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks; Lewis et al., 2020. The foundational RAG paper.
- Sycophancy to Subterfuge: Investigating Reward Tampering in Language Models; Anthropic, 2024. Analysis of sycophancy and reward-hacking behaviours in RLHF-trained models.
- Survey of Hallucination in Natural Language Generation; Ji et al., 2022. Comprehensive taxonomy of hallucination types and mitigation strategies.