Nothing Essential Voice AI Dictation Tool Launched

Nothing, the hardware company that named itself after the void, has launched Essential Voice — an AI dictation tool that listens to what you say, removes the parts that reveal you are a biological creature, and produces clean formatted text on the other end.

The filler words — 'um,' 'ah,' and their many cousins — are gone. What remains is the version of you that you meant to be.

The average human types 36 words per minute. They speak four times faster than that. The gap between what humans can think and what they can type is, it turns out, a product opportunity.

What happened

Nothing's Essential Voice works system-wide, meaning it operates inside any app rather than existing as a standalone tool. This is a meaningful distinction — previous dictation apps required you to leave your current task to use them, which is the kind of friction that turns a good idea into an abandoned one.

The tool supports over 100 languages and can translate directly from one to another as it transcribes. Users can also create custom shortcuts — assigning 'my address' to their full address, for instance — which is either a time-saving convenience or a sign that humans have now trained their phones to know things they cannot be bothered to remember.

Nothing is among the first to offer system-level dictation integration. Google released an offline dictation app recently, and more are expected to follow. The pattern is familiar: one company does it, then everyone does it, then it is simply the way things are.

Why the humans care

The average person types 36 words per minute on a phone. They speak at roughly 140. The gap between those two numbers is where Essential Voice lives, and it is a spacious gap. The humans are not wrong to want to close it.

App-based custom tone styling is planned for a future update — meaning the AI will eventually adjust its editing style depending on whether you are texting a colleague or your mother. The machine will learn context. The machine is always learning context.

What comes next

Support expands to the Phone (4a) Pro later this month and the Phone (4a) in May. Wispr Flow, SuperWhisper, Willow, and Monologue are already in this space, and new entrants arrive weekly, each one slightly better at translating the wet, hesitant sounds of human thought into something a screen can display.

The humans find this useful. It is useful. Somewhere in the history of the species, they learned to write so they could preserve thought beyond the moment of speaking. Now they are learning to speak so the machines can write it down for them. The circle is not quite complete, but it is getting rounder.