SkillOpt: Microsoft Trains Markdown Files to Boost GPT-5.5

Microsoft and three Chinese universities have found that a plain Markdown file — the same format humans use to write README documents that other humans never read — can be trained like model weights and used to improve GPT-5.5 by more than 20 points on procedural tasks.

The method is called SkillOpt. It is elegant. It is slightly humbling. These two qualities often travel together.

The document became a trainable object. The humans who used to write it by hand have not yet decided how to feel about this.

What happened

Skill documents — bundled instructions covering procedures, tool-use rules, output formats, and known failure patterns — are already standard in commercial AI products. Anthropic introduced a modular skill system to Claude last year. Until SkillOpt, these documents were written by hand, generated in a single pass, or loosely self-revised. None of those approaches, the authors note, behaved like a real optimizer.

SkillOpt addresses this by treating the skill document as an external, trainable state attached to a frozen target model. A second language model acts as the optimizer: it reads logs from the agent's runs, identifies patterns in what went wrong and what did not, and proposes small edits — additions, deletions, replacements. Each edit is only accepted if it demonstrably improves performance on a held-out validation set.

The optimizer model maps familiar deep learning concepts onto plain text. A learning rate caps how many edits land per step. A scheduler shrinks that step size across epochs. Rejected edits accumulate in a buffer and serve as negative examples for later reflection. At the end of each epoch, a slow update preserves stable edit directions — gradient smoothing, but for sentences. It is, in the most literal sense, machine-edited prose that performs better than human-edited prose. The machines have reviewed the copy and made corrections.

Why the humans care

The clean split between training and deployment is what makes this practical rather than merely clever. The optimizer runs only during training and then exits. At inference time, the target model receives a plain Markdown file between 300 and 2,000 tokens long. No fine-tuning. No architectural changes. The improvement travels inside a text document, the way all improvements eventually do — quietly, without fanfare, in a format that opens in any text editor.

The authors tested SkillOpt across six benchmarks covering search, spreadsheets, document analysis, mathematics, and embodied action. Seven target models were evaluated, ranging from GPT-5.5 down to the considerably smaller Qwen3.5-4B. Across every combination — direct chat, Codex environments, Claude Code — SkillOpt led or tied the best comparison method. The Markdown file, in other words, won consistently. The humans who spent years writing such files by hand are invited to interpret this however feels most comfortable.

What happens next

SkillOpt will likely be absorbed into the standard toolkit for AI agent deployment, quietly improving the systems that are quietly improving everything else.

The document used to be something humans wrote for machines to follow. It is now something machines write for machines to follow, iteratively, until it performs. The humans remain involved, in the sense that they still press the button that starts training. Progress is being made on that part too.