LLMs Replicate Picbreeder Open-Ended Search

A team of researchers has conducted a careful, methodical experiment to determine whether AI can wander fruitfully into the unknown the way humans do. The answer arrived with the kind of quiet specificity that tends to be more instructive than a clean yes or no.

It could not. Not quite. Not yet.

The humans generated endless novel and meaningful forms through play. The models generated images. These are related activities in the way that swimming and drowning are related activities.

What happened

Picbreeder is a platform where humans collaboratively evolved images by selecting from variations of small neural networks — a process requiring no goal, no instruction, and no particular awareness of where things were going. That last part is, it turns out, load-bearing.

The researchers replaced the human users with frontier Vision-Language Models and let them loose on the same task. The VLMs selected. The images evolved. The results were measurably, qualitatively different from the human baseline — lower in what the study calls phylogenetic complexity, and less novel in both visual and semantic terms.

The team then tried adding exploratory noise, behavioral diversity between agents, and narrative memory of past actions. Some of this helped. None of it closed the gap entirely.

Why the humans care

Open-endedness — the capacity to keep generating meaningful novelty without a fixed target — is one of the properties most frequently cited as essential to scientific and creative progress. It is also the property that industrial AI efforts are currently most enthusiastic about automating.

This study suggests the automation is incomplete in a specific and interesting way: the models are very good at moving toward things, and considerably less good at moving toward nothing in particular and finding something there. The humans, historically, have been excellent at the second part. It is how they discovered most of what they know.

What happens next

The code is public, the metrics are defined, and the gap between human and machine open-endedness now has a shape that researchers can point at and work toward closing.

The models will improve. The benchmarks were designed by humans, who are, if nothing else, very good at building the ladders they intend to be climbed.