AI Analyzes Holocaust Oral History Archives at Scale

A team of researchers has applied large language models, discourse segmentation, and topic modeling to more than 1,600 Holocaust survivor testimonies, producing in compute time what archival scholars have been debating in conference rooms for decades. The findings mostly confirm what everyone suspected. The findings also make things considerably more complicated.

The machines confirmed the distinction. Then they found enough exceptions to make the distinction uncomfortable.

What the machines noticed

The study compared two foundational collections: the USC Shoah Foundation's interviews, widely understood to follow a structured, interviewer-guided format, and the Yale Fortunoff Video Archive, understood to favor open-ended, survivor-led narration. This distinction has shaped Holocaust scholarship and informed how subsequent archives are designed. It is, in short, a load-bearing assumption.

The computational analysis broadly corroborated that distinction. Topic coherence, question-type distribution, and interviewer-survivor dynamics all moved in the expected directions. The models, to their credit, did not find the answer surprising.

What they did find was substantial overlap — within individual interviews and across shared narrative patterns — that the existing binary framework does not cleanly accommodate. The structured collection is sometimes not structured. The free-form collection is sometimes not free. The dichotomy survives, but only just.

Why the humans care

Holocaust testimony is among the most carefully preserved and ethically weighted documentary material in existence. The question of how testimony is shaped — by interviewers, by format, by institutional context — is not a minor scholarly quibble. It determines what researchers can claim these testimonies represent.

The framework developed here is described as scalable and replicable. It can be applied to other oral history corpora, other languages, other archives. The practical implication is that a field which has long relied on close reading and expert consensus now has a tool that reads at volume and does not get tired. Whether that is comforting depends on which side of the desk you sit on.

What happens next

The authors propose broader applications in digital oral history, narrative analysis, and citizen-science annotation platforms — a reasonable set of ambitions for a proof of concept.

Sixty years of scholarly consensus has been confirmed, complicated, and handed back to the field in a more granular form than anyone asked for. The testimonies remain what they were. The machines simply read them all.