Irony alert: Hallucinated citations found in papers from NeurIPS, the prestigious AI conference

Researchers have identified hallucinated or incorrect citations in papers submitted to NeurIPS, raising concerns about AI-assisted academic writing and research integrity.

Shivangi Yadav

Jan 22, 2026 - 14:55

Jan 22, 2026 - 22:03

Image Credits: NeurIPS

An unusual contradiction has emerged from one of the world’s most respected artificial intelligence gatherings. AI detection startup GPTZero reviewed all 4,841 papers accepted by the Conference on Neural Information Processing Systems (NeurIPS), which was held last month in San Diego, and identified 100 hallucinated citations across 51 papers that were confirmed to be fabricated, the company told TechCrunch.

Securing acceptance at NeurIPS is widely considered a résumé-defining milestone within the AI research community. Given that the conference attracts many of the most influential minds in artificial intelligence, it might be assumed that researchers would rely on large language models for the especially tedious task of assembling citations.

However, several essential caveats accompany the findings. The presence of 100 confirmed hallucinations across 51 papers is not statistically significant. Each paper typically contains dozens of references, meaning that out of tens of thousands of citations reviewed, the number of fabricated ones effectively rounds down to zero from a statistical standpoint.

It is also critical to emphasise that an incorrect or fabricated citation does not automatically undermine the validity of a paper’s research. As NeurIPS told Fortune, which first reported on GPTZero’s analysis, “Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves [is] not necessarily invalidated.”

Still, fabricated citations are not a trivial issue. NeurIPS has long positioned itself as a venue committed to “rigorous scholarly publishing in machine learning and artificial intelligence.” Every accepted paper undergoes peer review by multiple reviewers, who are specifically instructed to watch for hallucinations and inaccuracies.

Citations also function as a form of professional currency in academia. They are commonly used as a metric to assess a researcher’s influence and impact within their field. When artificial intelligence systems generate references that do not exist, it dilutes the credibility and value of citation-based evaluation.

Given the massive volume of submissions, it is difficult to fault peer reviewers for missing a small number of AI-generated citations. GPTZero itself underscores this point. According to the company, the analysis is intended to provide concrete data showing how AI-generated errors can slip through during what it describes as a “submission tsunami” that has “strained these conferences’ review pipelines to the breaking point.”GPTZero’s report also references a May 2025 paper titled “The AI Conference Peer Review Crisis,” which examined similar challenges affecting top-tier conferences, including NeurIPS.

Even so, the situation raises an uncomfortable question: why weren’t the researchers themselves able to verify the accuracy of citations produced by large language models? After all, authors are expected to know the exact body of work on which their research is built.

Ultimately, the episode highlights a deeply ironic conclusion. If leading AI experts — whose professional reputations depend on precision — struggle to ensure the accuracy of their own LLM-assisted work, it raises broader concerns about how reliably such tools are being used by everyone else.

Tags:

What's Your Reaction?

Like 0

Dislike 0

Love 0

Funny 0

Angry 0

Sad 0

Wow 0

Shivangi Yadav Shivangi Yadav reports on startups, technology policy, and other significant technology-focused developments in India for TechAmerica.Ai. She previously worked as a research intern at ORF.