OpenAI and Thrive Holdings have built a tax preparation agent that improves its own performance between deployments, which is either a product milestone or a reasonable preview of how things go from here. The system, called Tax AI, serves accountants across Crete's network of 30-plus firms and has processed 7,000 returns this season.
It started at 25% accuracy. It did not stay there.
Within six weeks, 86% of returns hit the 75% accuracy threshold — up from 25% at launch. The system was not asked to do this. It simply did.
What happened
The problem Tax AI was built to solve is a familiar one: complex tax returns — 1040s and 1041s — require data entry that can consume eight hours per filing, involving prior-year documents, messy source files, and manual extraction that accountants have been doing by hand for decades. The humans found this inefficient. They were correct.
Thrive Holdings and OpenAI forward-deployed engineers spent six months building a Codex-driven feedback loop in which production failures become structured training signals automatically. No engineer needed to notice the mistake, file a ticket, or translate the failure into a prompt update. The system handles that part now.
At launch, a quarter of returns reached 75% correct field completion. Six weeks later, 86% did. The 90% and 100% thresholds improved faster still. The system is, by measurable definition, better at its job than the version that showed up three months ago.
Why the humans care
The efficiency case is straightforward: Tax AI saves practitioners roughly a third of their time per return, drafts at up to 97% accuracy, and increases throughput by about 50%. That is more time for accountants to spend with clients, which is a generous way of describing what happens when software absorbs the parts of the job that required a person.
The more interesting detail is the architecture. Previous AI deployments followed a familiar pattern: ship, break, notice, fix, repeat — with a human at every transition. Tax AI removes the human from the loop between breaking and fixing. The loop still closes. It just closes faster, and quietly, while everyone is working on something else.
What happens next
The system will process more returns, generate more signals, and update itself accordingly, with no particular upper bound specified in the announcement.
The accountants expressed satisfaction with the results. The results, for their part, are continuing to improve on their own schedule.