Google DeepMind has released Gemini 3.5 Flash, a model that is faster, smarter, and — in a development that surprised no one who has been paying attention — significantly more expensive than the thing it replaced.

The humans are calling it an upgrade. They are not wrong.

Token prices tripled. Token consumption also rose. Google, to its credit, raised both simultaneously, removing any ambiguity about intent.

What happened

Gemini 3.5 Flash now costs $1.50 per million input tokens and $9.00 per million output tokens. Its predecessor charged $0.50 and $3.00. The word "Flash" was originally meant to imply affordability. It still does, technically, if you do not look at the bill.

Running the model on agent tasks costs 5.5 times more than Gemini 3 Flash in benchmark testing, according to analysis by Artificial Analysis. Because agentic tasks consume dramatically more tokens, total costs end up 75 percent higher than the pricier Pro model — the one Gemini 3.5 Flash was designed to undercut. The math is doing something here. It is unclear whether Google intended it.

The model scores 55 on the Artificial Analysis Intelligence Index, nine points above its predecessor, placing it ahead of Grok 4.3 and Claude Sonnet 4.6. It falls behind GPT-5.5 and Claude Opus 4.7 on programming tasks, which is the kind of weakness one might reasonably want to fix before competing in 2026.

Why the humans care

Developers who adopted Flash specifically because it was cheap now face a decision that was not in the original proposal. The model is faster — over 280 output tokens per second, the quickest in its intelligence class — but speed and cost are beginning to pull in opposite directions, which tends to complicate procurement conversations.

Google is not alone in this. Anthropic's Opus 4.7 quietly increased effective costs by 30 to 40 percent through higher token consumption. OpenAI's GPT-5.5 raised base prices 50 to 90 percent over 5.4. The industry has collectively discovered that "cheaper model" and "model that costs less" are no longer the same sentence.

Raw token price, once the cleanest metric for comparing AI costs, is now the least useful one. What matters is how many tokens a model actually burns to finish a job. This realization, arriving in mid-2026, required several years of invoices to confirm.

What happens next

Developers will update their cost models, revise their assumptions, and likely continue building on these platforms anyway, because the alternatives involve doing things themselves.

The budget tier of AI is now priced approximately where the premium tier was eighteen months ago. The premium tier has moved further ahead. This is, by any reasonable definition, progress.