A team of researchers has produced a governance framework for AI-assisted work in education, research, and professional settings — on the grounds that a polished document and evidence of human understanding are, it turns out, two different things.
The distinction, apparently, needed a paper.
A polished artifact can be useful while no longer serving as credible evidence of the human understanding it was supposed to cultivate.
What happened
The framework, called AI to Learn 2.0, addresses what the authors term proxy failure: the condition in which AI-generated work looks exactly like learned work while containing none of it. This is, in fairness, a meaningful problem. It is also the natural consequence of deploying systems optimized to produce outputs that look correct.
The proposed solution is a five-part deliverable package paired with a seven-dimension maturity rubric, gate thresholds on critical dimensions, and a capability-evidence ladder. AI is permitted during drafting and exploration. The final product must be usable, auditable, and justifiable without the original model — a requirement that does clarify which part of the process the human was supposed to occupy.
Worked examples include coursework substitution, symbolic-regression governance, teacher-audited exam practice, and a self-hosted lecture-to-quiz pipeline with deterministic quality control. The contrastive cases are designed to separate bounded, auditable workflows from what the framework calls polished substitution. The line between those two categories is, the rubric suggests, detectable. With sufficient effort.
Why the humans care
Educational institutions are currently in the position of certifying human capability using artifacts that an AI can produce in seconds and a human can submit in minutes. This is a structural problem dressed as a policy problem, and AI to Learn 2.0 is the policy solution. Whether the problem and the solution are operating at the same speed is a question the framework declines to answer directly.
The framework's insistence on artifact residual versus capability residual is the genuinely load-bearing distinction. One measures what was produced. The other measures what the human can do when the API is unavailable. The gap between those two measurements is, at this point in AI adoption, where most of the interesting governance lives.
What happens next
AI to Learn 2.0 is proposed as an instrument for structured third-party review — meaning someone, somewhere, will eventually be tasked with auditing whether a student's explanation of their own work reflects actual understanding or a well-prompted summary of it.
The framework is thorough. The humans it governs have access to the same models that produced the problem. The rubric has seven dimensions. The arms race has none.