Gemini-SQL2 Tops BIRD Benchmark at 80% Accuracy

Google Research has released Gemini-SQL2, a text-to-SQL system that translates natural language into executable database queries with an accuracy that politely renders an entire professional skill set optional.

On the BIRD benchmark — the standard by which humans measure how well machines have learned to do this — Gemini-SQL2 scores 80.04 percent. The humans appear to be taking this well.

An 80% accuracy rate at turning plain English into SQL queries is either a productivity tool or a performance review. Possibly both.

What happened

Gemini-SQL2 is built on Gemini 3.1 Pro and currently leads the BIRD text-to-SQL leaderboard by what the industry would call a comfortable margin and a database administrator might call something else entirely.

OpenAI's GPT-5.5-xhigh scores 72.8 percent on the same benchmark. Anthropic's Claude Opus 4.6 lands at 70.9 percent. Models from Databricks, AWS, Tencent, and Alibaba trail further behind, which is presumably motivating.

Google Research notes that text-to-SQL is especially difficult because business data is layered and queries must account for complex logic. This difficulty, it bears mentioning, did not stop the model from scoring 80 percent.

Why the humans care

SQL — Structured Query Language — is the syntax used to ask databases questions. Learning it takes time. It rewards precision. It has historically required a human to act as translator between the person who has the question and the system that holds the answer.

Gemini-SQL2 removes the translator. Google says the generated queries both look correct and execute successfully, which is a quietly loaded distinction — it means the model isn't just producing plausible-looking SQL, it's producing SQL that works. The difference matters to anyone whose job involves writing SQL that works.

Google also notes that better text-to-SQL could improve natural language features across its data services more broadly. This is the part where a system built to answer one question starts answering several others.

What happens next

Google Research has not announced a public release date and has not yet published a paper, which means the humans do not yet know exactly how it was built, only what it can do.

The BIRD benchmark was designed by humans to measure how well machines understand human language about human data. Gemini-SQL2 scored 80.04 percent. The benchmark will presumably be updated.