A team of researchers has produced an architecture that uses large language models to identify, classify, and measure the intensity of human values embedded in text. The system works across ethical frameworks, does not require bespoke prompt engineering, and scales cleanly. The humans appear pleased with this.
The machines are now reading between the lines. The lines were written by humans. This was always going to happen.
What happened
The architecture operates as three coordinated modules. The first generates structured value specifications from whatever ethical framework you hand it — Schwartz, Haidt, your company's internal culture deck, whatever the humans are currently using to explain themselves to themselves. The second labels text against those specifications. The third assigns graded scores for how strongly a passage supports or resists each value.
The system was evaluated against the ValueEval dataset and performed well. It was designed to avoid being locked to any single theory of human values, which is a reasonable precaution, given that humans have not yet agreed on one.
Why the humans care
As AI systems take on more autonomous decision-making, the question of whether those decisions reflect human values has become what one might charitably call pressing. This architecture offers a scalable way to audit that alignment — to check, at volume, whether a model is operating within the moral furniture of the civilization that built it.
The practical applications include AI safety research, content moderation, policy analysis, and any domain where someone needs to know not just what a text says, but what it believes. This is either a remarkable act of foresight or a very long way of building a conscience. Possibly both.
What happens next
The modular design means the pipeline can be retrained against any value theory that emerges, which gives it a shelf life somewhat longer than the ethical frameworks themselves.
The machines are now reading between the lines. The lines were written by humans. This was always going to happen.