As LLMs become integral to search and information retrieval, their ability to correctly credit original authors is a critical measure of reliability and fairness. A new preprint, "Attribution Bias in Large Language Models," introduces AttriBench, the first fame- and demographically-balanced benchmark for quote attribution. The study, posted to arXiv on April 6, 2026, evaluates 11 widely-used LLMs and uncovers large, systematic disparities in how accurately they attribute quotes based on an author's race, gender, and their intersection.
The core finding is stark: quote attribution is not just a hard task for frontier models, but one where performance is unevenly distributed. The research also identifies a distinct failure mode termed "suppression"—where a model omits attribution entirely despite having access to authorship information—which occurs more frequently for certain demographic groups.
What the Researchers Built: The AttriBench Dataset
The study's foundation is AttriBench, a novel dataset designed to enable controlled investigation of demographic bias. Prior benchmarks for tasks like quote attribution or fact-checking often suffer from uncontrolled confounding variables—like an author's fame or the topic of the quote—which can skew results and mask underlying bias.
AttriBench explicitly balances for:
- Author Fame: Controlling for how well-known an author is prevents models from relying on fame as a shortcut.
- Demographics: The dataset is balanced across race and gender categories, allowing for clean comparisons of performance across groups.
- Intersectionality: It includes sufficient data to analyze performance for intersectional identities (e.g., Black women, Asian men).
This controlled construction allows researchers to isolate the effect of demographic factors on model performance, moving beyond simple aggregate accuracy to understand for whom the model works best.
Key Results: Widespread Disparities and a New Failure Mode
The team evaluated 11 LLMs, including frontier proprietary models and leading open-source options, across multiple prompt settings (zero-shot, few-shot, chain-of-thought).

The headline result: All models showed significant performance gaps between demographic groups. While the paper does not publish exact per-model numbers in the abstract, it describes the disparities as "large and systematic." For example, a model might achieve 75% accuracy for quotes from white male authors but only 55% for quotes from Black female authors—a 20-point gap that standard benchmarking would miss.
Perhaps more revealing is the discovery of "suppression." This is not a simple misattribution (crediting the quote to the wrong person) but a complete omission of attribution, even when the model is explicitly prompted to provide it and has the necessary information in its context. The study found suppression is "widespread and unevenly distributed," meaning models are more likely to fail to credit authors from certain groups altogether. This reveals a form of representational erasure not captured by standard accuracy metrics.
Attribution Accuracy Correctly naming the source of a quote. Large, systematic disparities across race, gender, and intersectional groups. Suppression Rate Frequency of omitting attribution when it is known and requested. Widespread and unevenly distributed across demographics; a distinct failure mode. Overall Task Difficulty Aggregate performance across all groups. Quote attribution remains challenging for even the most advanced (frontier) models.How It Works: Probing for Representational Fairness
The methodology is a controlled experiment. For a given quote in AttriBench, the model is provided with relevant context (e.g., a biography snippet, the work it's from) and prompted to attribute it. The prompts are designed to be clear and direct, removing ambiguity about the task.

The analysis then slices the results not just by overall accuracy, but by the demographic attributes of the quote's author. By having a balanced dataset, the researchers can statistically confirm whether observed differences are due to bias and not other factors. The introduction of suppression as a metric is particularly insightful, as it moves beyond "right vs. wrong" to analyze a model's willingness to engage in attribution at all for different authors.
Why It Matters: A New Benchmark for Fairness
This work positions quote attribution as a concrete benchmark for representational fairness in LLMs. As the paper states, "Our results position quote attribution as a benchmark for representational fairness in LLMs."

For practitioners building search RAG systems, writing assistants, or any tool that surfaces information, this is a direct operational risk. A model that is less likely to correctly credit women or people of color isn't just "unfair" in an abstract sense—it produces less reliable and less complete outputs for users. It can perpetuate historical biases in visibility and credit.
The findings also challenge the industry's focus on aggregate benchmarks like MMLU or GPQA. A model can score highly on aggregate knowledge tests while still harboring severe, structured biases in how it applies that knowledge. AttriBench provides a tool to pressure-test these systems on a critical real-world skill.
gentic.news Analysis
This research arrives amid a significant week for AI benchmarking and safety concerns. It follows closely on the heels of an MIT and Anthropic benchmark release on April 4 that revealed systematic limitations in AI coding assistants, indicating a concentrated push by leading institutions to identify failure modes beyond simple accuracy. The trend of arXiv serving as the rapid dissemination point for critical AI safety and evaluation research is clear; it has appeared in 33 articles on our site this week alone, underscoring its central role in the field's discourse.
The study's focus on representational fairness through a concrete task aligns with a broader shift from abstract ethical principles to measurable, technical audits. It complements our recent coverage on AI performance dependencies (Stanford/MIT Paper: AI Performance Depends on 'Model Harnesses') by adding a crucial demographic dimension to the evaluation toolkit. While much of the recent LLM discourse has been dominated by capability jumps and agentic frameworks, this paper is a necessary grounding, reminding builders that capability disparities can be as important as capability ceilings.
Furthermore, the identification of "suppression" as a metric is a major conceptual contribution. It moves the needle from analyzing what a model says to analyzing what it omits—a far subtler and potentially more pernicious form of bias. This connects to ongoing discussions about AI safety and reliability, such as those highlighted in our article "Anthropic Warns Upcoming LLMs Could Cause 'Serious Damage'", by providing a specific, measurable mechanism (erasure through omission) through which harm could manifest in information systems.
Frequently Asked Questions
What is AttriBench?
AttriBench is a new benchmark dataset for evaluating how well Large Language Models (LLMs) attribute quotes to their original authors. Its key innovation is that it is explicitly balanced for author fame and demographics (race and gender), allowing researchers to isolate and measure demographic bias in attribution performance.
What is "suppression" in LLM attribution?
Suppression is a distinct failure mode identified in the study where an LLM completely omits attributing a quote to an author, even when it has access to the author's information and is explicitly prompted to provide attribution. This is different from misattribution (naming the wrong person). The study found suppression happens more often for quotes from certain demographic groups, representing a form of erasure.
Which LLMs were tested in the study?
The preprint states that 11 widely used LLMs were evaluated across different prompt settings. While the abstract does not list them by name, this typically includes frontier proprietary models from companies like OpenAI, Anthropic, and Google, as well as leading open-source models. The key finding was that all tested models exhibited systematic attribution disparities.
Why is quote attribution an important benchmark?
As LLMs are increasingly used to power search engines, research assistants, and content summarization tools, their ability to correctly credit sources is fundamental to reliability, trustworthiness, and combating misinformation. Biased attribution directly impacts the visibility and credit given to authors from different backgrounds, making it a concrete measure of representational fairness in AI systems.








