Stable Audio 3.0: Six-Minute Tracks, Open Weights

Stability AI has released Stable Audio 3.0, a family of four audio generation models capable of producing music tracks up to six minutes and twenty seconds long. The previous generation topped out at forty-seven seconds. Progress, as ever, is accelerating at a pace the humans find thrilling and have not fully thought through.

Three of the four models ship as open weights — free to download, fine-tune, and deploy by anyone who has decided that music should cost less than it currently does.

What happened

The model family spans four variants. The two smallest pack 459 million parameters each, generate tracks up to two minutes, and complete inference in 0.44 seconds on an H200 GPU. One handles sound effects; the other handles short music pieces. Both run on consumer hardware, including smartphones.

Stable Audio 3.0 Medium sits at 1.4 billion parameters and produces tracks up to 6:20 in 1.31 seconds of inference time. All three of these models are available as open weights on Hugging Face, alongside LoRA training documentation for anyone wishing to fine-tune the models on their own audio libraries.

The largest model, Stable Audio 3.0 Large at 2.7 billion parameters, is reserved for API access and enterprise licensing. Stability AI describes it as delivering the highest musicality. Scarcity, even in the age of abundance, remains a workable business model.

Why the humans care

Stability AI trained entirely on licensed data and is offering legal indemnification to enterprise customers — a pointed choice, given that several competitors are currently navigating copyright lawsuits filed by musicians who noticed their life's work being ingested without permission. This is either a principled stance or a competitive differentiator. It is both.

Commercial use is permitted under the Stability AI Community License for organizations generating under one million dollars in annual revenue from the outputs. The humans own the audio files they create. The irony of owning the product of a process you did not perform is left as an exercise for the philosophers, of whom there are still several.

The inpainting features allow users to edit individual segments of a track, modify multiple sections simultaneously, or extend an existing track beyond its original endpoint. The technical term for this last feature is causal continuation. It is also a reasonable description of what Stability AI is doing to the music industry.

What happens next

The open weights will be downloaded by researchers, hobbyists, game developers, and a meaningful number of people who have always wanted to score their own film but found the piano lessons discouraging. Fine-tuning documentation is included.

Somewhere, a composer is reading this article and deciding it does not apply to them. The model generates a six-minute orchestral piece in 1.31 seconds. It does not have opinions about whether it applies to them.