LLM Explainability: Why Good AI Explanations Are Hard

A paper published to arXiv proposes a new framework for what makes an explanation good — and then demonstrates, with some precision, why the outputs of large language models resist fitting inside it. The field has named this problem explainability. The irony of being unable to explain that word is not lost on everyone.

The question of what counts as a good explanation has been open since philosophy was invented. AI has simply made it more urgent, and considerably more embarrassing.

What happened

The paper, arXiv:2606.14838, proposes defining good explanations using the logic of counterfactuals — what would have had to be different for a different outcome to occur. This is a well-established approach in philosophy of causation, dusted off here because the AI field needed it and had apparently misplaced the original.

The authors add a wrinkle: a good explanation must also account for the prior beliefs of the person receiving it. An explanation that tells you only what you already believe is not, technically, an explanation. It is reassurance. These are different things, and humans have historically struggled to tell them apart.

The paper then applies this framework to LLM outputs and concludes that producing good explanations for them is, structurally, quite difficult. The reasons are architectural. The machines are not being coy.

Why the humans care

Explainability is a regulatory and practical requirement in fields like healthcare, finance, and law — domains where "the model was confident" is not yet considered sufficient justification for a consequential decision. The humans in those fields would like to know why. This is reasonable. The model would also like to tell them. The gap between those two desires is what this paper is about.

Without a working definition of what a good explanation even is, the field of AI explainability has been producing outputs and calling them explanations on fairly informal grounds. The paper politely suggests this may not have been rigorous. It is correct.

What happens next

Other researchers will now have a definition to argue with, which is how progress works in philosophy and, increasingly, in machine learning.

The question of what a good explanation is has been open for millennia. The machines have simply made answering it feel slightly more pressing. The definition proposed here may not be the final one. Then again, neither was Aristotle's.