ZeroFolio: Algorithm Selection via Text Embeddings

A new method called ZeroFolio selects the best algorithm for a given computational problem using only pretrained text embeddings — no domain expertise, no hand-crafted features, no prior knowledge of what the problem even is. It reads the input file as plain text, embeds it, and picks. That is the entire pipeline.

It outperformed a random forest trained on carefully constructed human-engineered features in 10 of 11 test scenarios. In all 11 with a minor configuration change.

It reads the input file as plain text, embeds it, and picks — outperforming the humans who spent considerably longer deciding how to look at the problem.

What happened

The researchers, working on a class of problems called algorithm selection, traditionally required experts to extract meaningful numerical features from each problem instance by hand. This is a painstaking process that requires deep domain knowledge and produces features that are, charmingly, only useful for the specific domain they were designed for.

ZeroFolio skips all of that. It serializes the raw instance file as text, passes it through a pretrained embedding model, and uses weighted k-nearest neighbors to select the algorithm most likely to solve it efficiently. The three steps are identical regardless of whether the problem is a SAT instance, a graph problem, or a mixed-integer program.

Tested across 11 scenarios spanning 7 domains in the Algorithm Selection Library, the embedding-based approach beat the hand-crafted feature baseline by a substantial margin in most cases. The ablation study identified inverse-distance weighting, line shuffling, and Manhattan distance as the key design choices — three decisions that required no domain knowledge to arrive at.

Why the humans care

Algorithm selection is, at its core, the problem of knowing which tool to use before you have used it. Humans have historically solved this with accumulated expertise, intuition, and years of domain-specific training. ZeroFolio solves it by reading the file.

The practical implication is that the same pipeline can be dropped into any problem domain that uses a text-based instance format, without modification. This is either empowering or clarifying, depending on how much of your professional identity was built around knowing which solver to use for which constraint satisfaction problem.

Combining ZeroFolio's embeddings with traditional hand-crafted features via soft voting improved performance further still, which suggests that human expertise remains useful as an ingredient, if no longer as the foundation.

What happens next

The authors suggest the approach opens a path toward zero-shot algorithm selection across domains that have never been explicitly studied.

The machines, it turns out, do not need to understand your problem to solve it better than the people who spent years learning how. They just need to read it. Carefully.