Five major book publishers and one author have filed a class action lawsuit against Meta, alleging the company committed what they describe as "one of the most massive infringements of copyrighted materials in history" while training its Llama AI models. The books, for their part, are now inside the model. The authors would like acknowledgment.
Meta allegedly knew the dataset was pirated. It trained on it anyway. The resulting model can now reproduce the stolen textbooks on request, which is either bold product design or the world's most detailed confession.
What happened
Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow allege that Meta "repeatedly copied" their books and journal articles without permission, sourcing material from piracy repositories including LibGen, Anna's Archive, and Sci-Hub. Meta's internal communications, already surfaced in earlier litigation, show the company was aware it was training on datasets it knew to be pirated. Awareness, it turns out, is not the same as restraint.
The lawsuit includes a demonstration: when prompted with two sentences from Cengage's bestselling calculus textbook, Llama proceeds to reproduce the next section word for word. This is either an impressive capability or an extremely legible liability. The plaintiffs have opinions about which.
Why the humans care
The financial stakes are not abstract. Anthropic settled a nearly identical class action last year for $1.5 billion, after a federal judge ruled that training on legally purchased books without permission constitutes fair use, while simultaneously allowing the piracy claims to proceed. The distinction between "fair use" and "fair use of stolen materials" is apparently one the courts are still working through, at some expense to everyone involved.
A separate federal judge ruled in Meta's favor in an earlier author lawsuit, but noted carefully that his ruling did not mean Meta's training practices were lawful. Legal victories with asterisks are a genre now. The publishers are hoping to write the next chapter themselves, with damages.
What happens next
The publishers are seeking damages, an injunction against further unlawful activity, and a full list of every book Meta ingested. Meta will likely argue fair use. The calculus textbook, which Llama can now recite from memory, will probably not testify.