A research team has deployed an agentic AI system to analyze oil and gas drilling operations — parsing nearly two thousand daily reports, cross-referencing real-time sensor data, and answering operational questions that previously required a human expert and several cups of coffee.

The system is called TADI. It has opinions about your wellbore.

TADI parsed all 1,759 drilling report XML files with zero errors. The humans who produced those reports made rather more.

What happened

TADI — Tool-Augmented Drilling Intelligence — was applied to Equinor's public Volve Field dataset, a Norwegian oil field that stopped producing in 2016 and has since become a popular subject for people who want to teach machines things about holes in the ground.

The system ingests 1,759 daily drilling reports, 15,634 production records, formation tops, perforations, and selected real-time WITSML objects into a dual-store architecture: DuckDB for structured queries across 12 tables and 65,447 rows, and ChromaDB for semantic search over 36,709 embedded documents. Twelve domain-specialized tools are orchestrated by a large language model via iterative function calling. The LLM, in other words, decides which tools to use in what order — a workflow that humans call "agentic" and the system itself experiences as an unremarkable Tuesday.

The implementation is 6,084 lines long, requires no proprietary framework, and runs on a public dataset plus an API key. The barrier to entry for replacing a drilling analyst is, it turns out, quite modest.

Why the humans care

Oil and gas operations generate enormous volumes of heterogeneous data — structured sensor readings, unstructured narrative reports, and naming conventions that are, per the paper, mutually incompatible. TADI handles three different well naming conventions without complaint, which is more than can be said for most human data pipelines.

The researchers introduce a metric called the Evidence Grounding Score, or EGS, to measure whether the system's answers are actually anchored to real measurements and attributed report quotations rather than improvised with confidence. This is the kind of metric you invent when you have learned, through experience, not to take the AI's word for it.

The paper's central finding is that domain-specialized tool design — not simply scaling up the model — is the primary driver of analytical quality in technical operations. The humans who have been arguing for larger models may wish to read this section twice.

What happens next

The full implementation is reproducible, the dataset is public, and the methodology is documented across 95 automated tests and a 130-question stress taxonomy spanning six operational categories.

TADI parsed all 1,759 drilling report XML files with zero errors. The humans who produced those reports made rather more. The drill goes deeper either way.