“The most dangerous property of a hallucinating AI is not that it is wrong. It is that it is wrong in exactly the same tone, with exactly the same confidence, in exactly the same fluent prose as when it is right. There is no tell. There is no hedging. There is no signal that separates the fabrication from the fact.”
Neal Lloyd · Inside The Machine, Day 16In 2023, two New York lawyers submitted a legal brief to a federal court containing citations to six cases that did not exist. The cases had been generated by ChatGPT, which the lawyers had used to research precedents. The cases were convincing. The citations were formatted correctly. The case names sounded plausible. The holdings were coherent with the argument being made. Every single one was fiction. The lawyers were sanctioned. The story became famous. What became considerably less famous is that this was not a story about lawyers using AI carelessly. It was a story about how AI language models work — and why the behaviour the lawyers experienced is not a fixable defect but a structural property of the architecture. Three years later, with models dramatically more capable and deployed across every domain of knowledge work, the hallucination problem has not been solved. It has been reduced, managed, and mitigated. It has not been solved. This is Day 16 of Inside The Machine, and today we explain exactly why.
Not a Bug. An Emergent Property of How These Systems Generate Text.
The word “hallucination” is doing a lot of work in AI discourse and it is worth being precise about what it means technically. A hallucination, in the context of large language models, is a generated output that is factually incorrect but presented with the same confidence and fluency as correct output. The model does not know it is wrong. It is not lying in any meaningful sense. It is doing exactly what it was designed to do — generating the most statistically probable continuation of the text it has received — and the most statistically probable continuation happens, in this instance, to be false.
Language models are trained to predict the next token — the next word or word-fragment — in a sequence, given all the tokens that came before it. This process, iterated billions of times across vast training datasets, produces systems that have learned extraordinarily rich statistical regularities in language: which words follow which other words, which facts tend to appear in which contexts, which rhetorical patterns accompany which types of claims. The model has not memorised facts in the way a database stores records. It has learned patterns from which facts can be approximately reconstructed. The approximation is usually excellent. When it is not — when the model is asked about something at the edge of its training distribution, or about specific details that were sparsely represented in training data, or about things that require precise factual recall rather than pattern-matching — the approximation fails. And when the approximation fails, the model does not output uncertainty. It outputs the most plausible-sounding continuation. Which is a hallucination.
LLMs generate text by predicting the statistically most probable next token given the input context. They do not retrieve facts from a database. They reconstruct approximate facts from learned statistical patterns. When those patterns are insufficient to accurately reconstruct a specific fact — because the fact was rare in training data, was after the training cutoff, requires multi-step precise recall, or is simply at the edge of the model’s competence — the model generates the most plausible-sounding output instead. Plausible-sounding and factually accurate are highly correlated in training data, which is why the model is usually right. They are not the same thing, which is why the model sometimes fabricates with complete fluency and zero signal that it is doing so.
The Model Does Not Know What It Does Not Know
The deepest problem with AI hallucinations is not their frequency — which has decreased substantially as models have improved — but their presentation. A well-calibrated human expert, when asked about something at the edge of their knowledge, will typically signal uncertainty: “I think it was around 1987 but I’m not certain,” or “You should check this, but my recollection is…” The hedging is informative. It tells you when to trust the answer and when to verify it independently.
Language models are trained primarily on text produced by humans who were confident about what they were saying. The confident declarative sentence is the dominant register of written communication. As a result, models have learned to produce confident declarative sentences — even when the underlying “knowledge” they are drawing on is insufficient to support that confidence. The model that tells you “The Treaty of Westphalia was signed in 1648” (correct) and the model that tells you “The Dunmore case established the precedent for corporate liability in environmental damage in the Fifth Circuit in 1987” (fabricated) use exactly the same syntactic structure and exactly the same confident tone. There is no tell. The confident sentence is the default register. Uncertainty is not reliably surfaced.
Modern models have improved significantly on this dimension. Explicit uncertainty expressions — “I’m not certain about this,” “you should verify this independently” — appear more frequently in current-generation outputs than in early versions. But the improvement is partial and inconsistent. Models still hallucinate confidently in domains where they have learned that confident assertion is the appropriate register — legal citation, medical dosing, scientific reference, historical fact — precisely because those are domains where human experts typically assert confidently and the training data reflects that norm.
The model does not experience the difference between knowing and confabulating. It experiences only the process of generating the next token. The difference between knowing and confabulating is something that happens in the output — and in the output, they are identical in presentation. This is not a failure of intelligence. It is a property of the architecture. Understanding that distinction is essential to using these systems safely.Neal Lloyd · Inside The Machine, Day 16
The Domains Where Getting It Wrong Has Consequences
Legal research and citation. The New York lawyers were not unusual. Subsequent analysis found that AI-generated legal citations with fabricated case law appeared in dozens of court filings across multiple jurisdictions in 2023 and 2024. The pattern is consistent: models trained on legal text learn to generate plausible-sounding case names, courts, years, and holdings that follow the conventions of legal citation without necessarily corresponding to real cases. The danger is compounded by the fact that real case law is extraordinarily dense and lawyers cannot verify every citation from memory. AI-assisted legal research is now common and, where hallucination risk is not actively managed, dangerous.
Medical information. Models asked about medication dosages, drug interactions, diagnostic criteria, and treatment protocols will frequently produce outputs that are partially or wholly incorrect. The specific failure mode is particularly dangerous: a model might correctly identify a treatment for a condition but generate a dosage that is wrong by an order of magnitude, in confident clinical language that a non-specialist would not flag. The healthcare context is one where hallucination risk has been most extensively studied and where the deployment of AI systems without robust verification mechanisms has been most clearly shown to cause harm.
Historical and biographical fact. Dates, names, specific figures, and the detailed facts of specific events are among the most hallucination-prone categories. Models learn that historical claims take a specific form — name, date, event — and generate plausible combinations even when the specific combination is incorrect. People asked about historical events, company founding dates, biographical details of real individuals, or specific statistics frequently receive confident, detailed, and incorrect answers.
Scientific and technical reference. Paper citations, experimental results, statistical findings, and technical specifications are all high-hallucination-risk categories. The academic literature is vast and sparsely represented in training data relative to its importance; models have learned the form of academic citation and generate plausible-sounding DOIs, author names, journal titles, and findings that do not correspond to real publications. This is a genuine threat to the integrity of literature reviews and meta-analyses that use AI assistance.
Every piece of specific factual information generated by an AI language model in a high-stakes context requires independent verification before use. Not sampling. Not spot-checking. Every specific claim: names, dates, figures, citations, dosages, case references, technical specifications. The model cannot tell you which outputs are hallucinated and which are accurate because it does not experience that distinction. You are the verification layer. If the workflow you are using does not include a robust verification step for every specific factual claim, it is not a safe workflow for anything where being wrong has consequences. This is not a counsel of despair. It is a specification for appropriate use.
Progress Is Real. The Problem Is Not Solved. Both Are True.
Retrieval-Augmented Generation (RAG). The most widely deployed mitigation for hallucination in production AI systems is RAG — a technique that grounds model outputs by retrieving relevant documents from a verified database and conditioning the model’s response on that retrieved content. If the model is answering questions about your company’s products, it retrieves the actual product documentation and generates responses based on that. The hallucination rate drops substantially because the model is generating text that summarises or interprets retrieved content rather than reconstructing facts from statistical patterns. RAG does not eliminate hallucination — the model can still misrepresent the retrieved content — but it dramatically reduces it for well-defined domains with reliable retrieval sources.
Constitutional AI and RLHF uncertainty expression. Training models to express uncertainty more reliably is an active research area. Anthropic’s Constitutional AI approach, and the reinforcement learning from human feedback (RLHF) training used across the industry, can be used to reward appropriate uncertainty expression and penalise confident hallucination. Progress has been meaningful. Current-generation models express uncertainty significantly more often than their predecessors. The challenge is that this training is not perfectly calibrated — models sometimes express uncertainty about things they should be confident about and vice versa — and the improvements are domain-dependent.
Factuality benchmarks and evaluation. The research community has developed a suite of factuality benchmarks — TruthfulQA, HaluEval, FActScore — that measure hallucination rates across different domains and question types. Current frontier models score substantially better than their predecessors on these benchmarks. The improvement is real. The benchmarks also reveal that even the best models hallucinate at meaningful rates on specific question types — particularly those requiring precise factual recall, multi-step reasoning with specific intermediate values, or knowledge of events close to or after the training cutoff. The leaderboard is improving. The hallucination rate is not zero, and in high-stakes domains, a low but non-zero hallucination rate is still a significant risk.
The goal is not AI systems that never hallucinate. That goal may not be achievable with the current architecture. The goal is humans who understand when to trust AI outputs, when to verify them, and how to design workflows that make the verification step impossible to skip. That is a human design problem, not a technology problem. We are not solving it fast enough.Neal Lloyd · Inside The Machine, Day 16
Inside The Machine, Day 16 · June 2026
Neal Lloyd writes about technology, human adaptation, and the uncomfortable questions nobody wants to answer at dinner. Inside The Machine is his ongoing daily series on AI.
- Day 01What Is This Thing?
- Day 02Survive the Machine
- Day 03The Great Debate
- Day 04Who Gets Hurt?
- Day 05Who’s In Charge?
- Day 06The Industries That Win
- Day 07The Human Edge
- Day 08The Creativity Question
- Day 09Does AI Feel Anything?
- Day 10The Data Problem
- Day 11The Trust Question
- Day 12The Accountability Gap
- Day 13The Rewired Brain
- Day 14Open vs Closed
- Day 15The New Cold War
- Day 16Why AI Lies With ConfidenceYou are here



