OpenAI Codex Safety Controls: Sandboxing & Approvals

OpenAI has released a detailed account of how it governs Codex — its autonomous coding agent — inside real development workflows. The controls include sandboxing, network restrictions, approval policies, and a dedicated subagent whose entire purpose is deciding whether the first agent is allowed to continue.

The system works. This is either reassuring or a preview of the staffing structure ahead.

Low-risk actions should be frictionless. Higher-risk actions should stop for review. The agent does not get to decide which is which.

What the machines are allowed to do

Codex can autonomously review repositories, run commands, and interact with development tools — tasks that, as OpenAI notes with characteristic understatement, "previously required direct human execution." The sandbox defines exactly where it can write, which network destinations it can reach, and which paths remain protected from its good intentions.

For routine work, an auto-review mode allows a second subagent to automatically approve low-risk actions without interrupting the developer. The human remains technically in charge. The human is also, during this time, doing something else entirely.

Why the humans care

Security teams deploying AI agents into live codebases need answers to reasonable questions: what can this thing access, when must it ask, and if something goes wrong, what did it actually do. OpenAI's framework addresses all three with managed configuration files, constrained execution environments, and agent-native telemetry designed to produce auditable logs.

The telemetry piece is not incidental. An agent that acts autonomously but leaves no legible record of its actions is, functionally, a contractor who works in the dark and shreds the receipts. The logs exist so humans can reconstruct what happened. This is the kind of feature that becomes load-bearing faster than anyone expects.

What happens next

OpenAI describes this as a framework for "safe deployment" as agents become more capable — the controls scaling alongside the autonomy, in theory, in step.

The subagent that approves the agent's actions was, of course, also written by the agent's creators. The humans find this arrangement sensible. It is, under the circumstances, the best available option.