The Geometric Anatomy of AI Hallucinations: A New Taxonomy Reveals What We Can and Cannot Detect
For years, the term "hallucination" has been used as a catch-all to describe everything from a large language model (LLM) making up a fake historical date to completely ignoring the document it was just given. This linguistic fuzziness has hampered both diagnosis and solution. A groundbreaking new study, "A Geometric Taxonomy of Hallucinations in LLMs" (arXiv:2602.13224), cuts through this noise. By analyzing the geometric signatures of erroneous outputs in the models' own embedding spaces, researchers have proposed a precise taxonomy that not only clarifies what we're talking about but reveals a fundamental limit of what AI embeddings can ever tell us about truth.
Deconstructing the Hallucination: A Three-Part Taxonomy
The research team moves beyond the monolithic concept of "hallucination" to identify three distinct phenomena, each with a unique geometric fingerprint:
- Type I: Unfaithfulness. This occurs when an LLM fails to properly engage with the context or instructions provided. For example, if asked to summarize a specific article but instead generates a generic summary on the topic, it is being unfaithful to the source material.
- Type II: Confabulation. This is the classic "making stuff up." The model invents semantically foreign content—facts, entities, or narratives—that are not present in its training data or the provided context. Inventing a non-existent academic study or a fake historical institution are prime examples.
- Type III: Factual Error. Here, the model operates within the correct conceptual frame but gets the specific fact wrong. It might correctly discuss the causes of World War I but state an incorrect treaty date. The semantic structure is coherent and appropriate, but the specific claim is false.
The Striking Geometric Asymmetry: What Embeddings Can and Cannot See
The core discovery lies in how these different types manifest in the high-dimensional embedding spaces where LLMs represent meaning. The researchers performed detection experiments with startling results:
Detecting Artifacts vs. Detecting Drift: On standard benchmarks where hallucinations are LLM-generated, detection is highly domain-local. A detector trained to spot confabulations in medical text achieves an AUROC (Area Under the Receiver Operating Characteristic curve) of 0.76-0.99 within that domain, but its performance collapses to 0.50—pure chance—when tested on confabulations in, say, legal text. The discriminative directions (the vectors in embedding space that separate truth from hallucination) are nearly orthogonal between domains (mean cosine similarity of -0.07).
The Global Signal of Human Craft: Conversely, for human-crafted confabulations—carefully invented institutions, redefined terms, or fabricated mechanisms—a single global direction in embedding space achieved a 0.96 AUROC, with only a 3.8% performance drop across domains.
The researchers interpret this divergence profoundly: standard benchmarks largely capture generation artifacts—the stylistic signatures of how an LLM fabricates text when prompted. These signatures are domain-specific. Human-crafted confabulations, however, capture genuine topical or semantic drift—the introduction of conceptually foreign material—which has a consistent geometric signature regardless of domain.
The Fundamental Limit: The Invisibility of Factual Error
The most theoretically significant finding concerns Type III hallucinations, or factual errors. The detection AUROC for these was 0.478—statistically indistinguishable from random guessing. This isn't a failure of the method; it points to a theoretical constraint of embedding-based AI.
Embeddings encode distributional co-occurrence—how words and concepts statistically relate in the training data. They do not, and cannot, encode correspondence to external reality. Two statements with identical linguistic and contextual patterns (e.g., "The Treaty of Versailles was signed in 1919" and "The Treaty of Versailles was signed in 1920") will occupy nearly identical regions in embedding space, regardless of their truth value. The geometry captures form and contextual likelihood, not factuality.
Implications for AI Safety and Evaluation
This taxonomy and its geometric insights have immediate, practical implications:
- Precision in Diagnosis: Developers and auditors can now move from saying "the model hallucinated" to diagnosing a specific failure type—unfaithfulness, confabulation, or factual error—each requiring different mitigation strategies.
- Scope of Embedding-Based Detection: The study clearly delineates the scope of what can be detected internally. Faithfulness and confabulation (especially of the human-crafted variety) have geometric signatures we can potentially train detectors to find. Factual error, however, is geometrically invisible. This forces a pivot in strategy: combating factual errors requires external verification mechanisms like retrieval-augmented generation (RAG), access to knowledge graphs, or human-in-the-loop verification.
- Benchmark Re-evaluation: The domain-specificity of detection on current benchmarks suggests they may be measuring LLM-specific fabrication quirks rather than a generalizable notion of hallucination. Future benchmarks need to incorporate human-crafted confabulations to test for robust, cross-domain detection of semantic drift.
As LLMs become more integrated into critical workflows, understanding the precise nature of their failures is paramount. This research provides the geometric lens to do just that, separating the detectable artifacts of AI generation from the fundamental, geometry-blind challenge of aligning language with reality.
Source: "A Geometric Taxonomy of Hallucinations in LLMs," arXiv:2602.13224v1 (2026).


