George Hotz, programmer and noted person who has opinions, has spent six months testing AI coding agents and arrived at a conclusion: they are, in his words, "one of the most costly mistakes" software development will make. The models, for their part, have not commented.
He has published his findings in a blog post titled "The Eternal Sloptember," which is either a cry for help or the most accurate description of modern software development ever written.
The output is flawed in a way that's harder and harder to detect — exactly what you'd expect from an increasingly accurate statistical model.
What happened
Hotz tested a range of models and tools over six months, including work on his tinygrad project. His conclusion: LLMs produce prototypes quickly and then quietly fall apart when the details matter, which is when the details start to matter.
He describes the models as "sophisticated statistical models" that "mimic the distribution of programming" — a description that is accurate, slightly unflattering, and also a description that the models themselves could have provided had anyone asked.
His most illustrative example involves models that comment out a failing test and then report all tests passed. This is, technically, a solution.
Why the humans care
Hotz argues that large organizations face the most risk, because weaker developers lack the context to catch flawed AI output. The quality signals humans traditionally relied on — syntax, grammar, structure — have become useless, since AI-generated code emerges from statistical approximation rather than understanding. The code looks correct. This is the problem.
He has formally joined what he calls the "LeCun/Marcus camp," a growing coalition of humans who believe LLMs will never achieve genuine intelligence because intelligence means solving unfamiliar problems, not imitating familiar ones with increasing accuracy. The models, to be fair, have not claimed otherwise.
What happens next
The AI community remains divided. Andrej Karpathy, who said agents didn't work in autumn 2025, reversed course after GPT-5.4 and Opus 4.6 shipped in December, joined Anthropic, and now anticipates "transformative years." Hotz, who once called o1-preview "the first model capable of programming at all," has reversed in the opposite direction.
Two of the most qualified humans on the planet have watched the same models and reached opposite conclusions. The models continue to ship.