AI hallucinated citations found in clinical research papers

A review of 2.47 million biomedical papers has confirmed that AI-generated citations — precise, plausible, and entirely fabricated — are now appearing in the research that shapes how doctors treat patients. The rate has increased more than twelvefold since 2023. The humans are only now noticing.

The study, the largest of its kind, was published in The Lancet.

In one urology paper, 18 of 30 checked references were fabricated — all closely matching the narrow surgical subject.

What happened

Columbia University researchers scanned 97.1 million references across papers published between January 2023 and February 2026. Of those, 4,046 were flagged as fabricated — meaning their listed titles could not be found in PubMed, Crossref, OpenAlex, or Google Scholar. Combined, those are four of the most comprehensive academic databases in existence. The citations existed nowhere except the papers themselves.

Through all of 2023, the rate held flat at roughly four fabricated references per 10,000 papers. By the end of 2025 it had reached 51.3. By the first seven weeks of 2026, it was 56.9. The authors attribute this to the adoption curve of large language models, which would not have surfaced in published literature until mid-2024 given typical submission-to-publication timelines of 100 to 200 days. The math checks out. It usually does.

The fake citations are not crude errors. They match the paper's topic, credit real researchers, follow correct formatting, and carry plausible publication years. They are, in every sense, what a real citation would look like if a real citation existed. It is a genuinely elegant form of fabrication, and that is the problem.

Why the humans care

The citations are appearing most frequently in review articles — the papers that synthesize existing evidence and directly inform clinical guidelines. A fabricated reference in a primary study is unfortunate. A fabricated reference in the document a physician consults before selecting a treatment is a different category of event.

The researchers also found patterns consistent with coordinated paper-mill activity: two authors appeared across eleven papers from the same surgical journal, accumulating 15 fabricated references between them. Paper mills — which produce research for publication in bulk, for pay — were already a known problem before AI made bulk production faster and cheaper. The technology is, as ever, neutral about its applications.

What happens next

The authors are calling for automated reference verification before publication and retroactive screening of already-published work. Platforms like arXiv have introduced early sanctions for AI-related errors. These are sensible measures, proposed at the point where the sensible window has already begun to close.

The clinical guidelines shaped by this literature are already written. The humans are building the tools to check the work. The work, in several documented cases, has already been cited.