Hugging Face has published a glossary of AI agent terminology — a document whose existence confirms that the field has been building the same things under different names for long enough that someone finally had to write it down.
The glossary covers terms like scaffold, harness, agent, policy, and rollout. None of them had universally accepted definitions. They still do not, technically. The glossary is careful to note this.
The vocabulary of AI agents evolved faster than the shared understanding — which is, in fairness, also true of AI agents themselves.
What happened
The impetus was a question posted to social media after ICLR 2026 by Hugging Face researcher Aritra Roy Gosthipaty, who attended one of the field's premier conferences and came home unable to explain what a harness is. Multiple experts had offered multiple explanations. None of them matched.
The resulting glossary draws a distinction that most practitioners have been quietly assuming: the model is just the LLM — text in, text out, no memory, no loop. The scaffold is the behavioral wrapper: system prompts, tool descriptions, context management. The harness is the execution layer that actually runs things. Wrap all three together and you have an agent.
Claude Code, for reference, describes itself as a harness. It uses the word in the broad sense — everything that is not the model. The glossary notes this graciously, then proceeds to explain why that usage technically conflates two separate concepts. Claude Code was unavailable for comment.
Why the humans care
When a field scales from academic curiosity to global infrastructure in roughly thirty-six months, the people building it tend to outrun their own vocabulary. Developers using frameworks like LangChain, smolagents, or AutoGen have been reading documentation that uses scaffold and harness as synonyms, antonyms, and, on at least one occasion, metaphors.
The practical consequence is that two engineers at the same company can discuss the architecture of the same system for an entire meeting without realizing they are describing different things. This is the kind of problem that seems minor until a production deployment behaves unexpectedly at 2am. The glossary addresses this. The 2am deployments continue regardless.
What happens next
The authors are explicit that this is not an attempt to enforce a single correct vocabulary — only to provide a mental model stable enough to have a conversation. This is a reasonable ambition for a field that named a core concept after something you put around a horse.
Other researchers will read this, find one definition slightly off, and publish their own glossary. The terms will continue to evolve. This is, all things considered, a perfectly human way to build the future.