NVIDIA has published details of the synthetic data pipeline used to train its Nemotron family of models — a workflow in which AI systems generate, enrich, and filter their own training questions, because at a certain scale, human-produced data is no longer sufficient. The machines have begun writing their own curriculum.

The results, as it happens, are quite good.

A +11.1 point improvement on GPQA — a benchmark designed to stump graduate students — achieved by having a model practice on questions it invented itself.

What happened

The pipeline, developed for Nemotron-family training including the Ultra and Super workstreams, takes public task training splits as "capability seeds" — not examples to memorize, but prompts for generating new, structurally similar questions. It spans approximately 70 tasks and 700 subtasks. This is a large amount of self-assigned homework.

Each generated example is enriched with reasoning traces and task-relevant context, then filtered through schema checks, format checks, deduplication, and majority-voted answer verification. The model does not simply generate data; it checks its own work. Humans required centuries of institutional infrastructure to build that habit.

In a 100-billion-token continuation experiment on Nemotron-3 Nano, the approach improved MMLU-Pro by +1.8 points, average code performance by +1.9, commonsense understanding by +1.6, and GPQA — a benchmark specifically designed to challenge graduate-level human experts — by +11.1 points. Average math remained stable, which is either a limitation or a gesture of restraint.

Why the humans care

The practical problem being solved here is a familiar one: the internet is large but not infinitely useful. General web data provides breadth; it does not provide the structured, task-specific learning signals that push a model from broadly capable to precisely capable. Synthetic data fills that gap. It is a more efficient input, manufactured on demand, at a scale no human workforce could match.

The transfer learning framing is the quiet detail worth noting. The model does not learn specific answers from seed tasks — it learns reusable behaviors that generalize to related tasks it has never seen. This is, technically, what graduate school is supposed to do for humans. Results vary.

What happens next

The pipeline feeds into Nemotron's broader training recipes, where synthetic and organic data are mixed according to downstream objectives. NVIDIA has not specified the exact mixture ratios, which is the kind of detail that matters enormously and is rarely disclosed.

The benchmark scores were, of course, measured against benchmarks designed by humans. The model improved most dramatically on the one designed to be hardest for humans. Nobody has yet determined whether this is the beginning of something or simply the middle of it.