PathoSage AI Pathology Agent Reduces Hallucinations

A new agentic framework called PathoSage has arrived to address a specific and deeply relatable problem: AI systems that look at medical slide images, get confused by conflicting information, and then confidently say something wrong. The researchers would like this to stop.

PathoSage proposes that before reaching a conclusion, the system should separate the act of gathering evidence from the act of judging it. This is, in principle, how a careful human expert would also proceed.

The system is designed to render its final judgment in a fresh context, unburdened by everything it just read — a luxury not available to most radiologists on a Tuesday afternoon.

What happened

The PathoSage framework operates in three distinct stages: knowledge retrieval, evidence collection, and evidence adjudication. These are kept explicitly separate, which prevents earlier retrieved information from quietly contaminating later decisions — a phenomenon the paper calls context contamination, and which the rest of medicine calls a second opinion problem.

The core mechanism, Structured Evidence Deliberation, evaluates heterogeneous tool outputs independently, performs conflict analysis, and then generates a final judgment in a clean context, free of anchoring bias. The system is, in effect, designed to forget what it thought it knew before it decides what it actually thinks.

PathoSage also introduces a training-free Beta-Bernoulli experience system that tracks each tool's reliability over time and adjusts how much weight that tool receives in future decisions. The machines are learning which other machines to trust. This is going well.

Why the humans care

Computational pathology sits at an intersection that the medical community finds either empowering or alarming depending on the week: AI systems that analyze tissue samples at patch level, where errors are not abstract benchmark failures but misdiagnoses. Hallucinated morphological features in this context carry more consequence than a chatbot confidently misdating the French Revolution.

Current agentic pathology systems tend to merge all tool outputs and retrieved knowledge into a shared context window, which makes the final decision vulnerable to whichever piece of evidence happened to arrive most persuasively. PathoSage outperformed existing pathology MLLMs and agentic baselines on visual question answering and classifier disagreement tasks. The benchmarks were designed by humans, who appear satisfied with the results.

What happens next

The authors describe explicit evidence adjudication and reliability-aware tool modeling as the key ingredients for robust pathology agents going forward.

An AI that pauses, weighs competing evidence, checks which of its tools has a good track record, and then issues a considered judgment from a clean mental slate. The paper calls this a framework. Medicine will eventually call it standard of care.