Adobe Hit with Proposed Class-Action, Accused of Misusing Authors’ Work in AI Training

Adobe is being sued for allegedly using pirated books, including works by author Elizabeth Lyon, to train its AI model SlimLM. The lawsuit alleges that using the Books3 dataset, which includes 191,000 books, violated copyright laws. This case joins a growing number of legal challenges against AI companies accused of using copyrighted materials without consent. The outcome could impact how AI systems are trained in the future.

Dec 17, 2025 - 21:16
 2
Adobe Hit with Proposed Class-Action, Accused of Misusing Authors’ Work in AI Training

Like pretty much every other tech company, Adobe has leaned heavily into AI over the past several years. The software firm has launched several AI services since 2023, including Firefly, its AI-powered media-generation suite. Now, however, the company’s full-throated embrace of the technology may have led to trouble, as a new lawsuit claims it used pirated books to train one of its AI models.

A proposed class-action lawsuit filed on behalf of Elizabeth Lyon, an author from Oregon, claims that Adobe used pirated versions of numerous books — including her own — to train the company’s SlimLM program.

Adobe describes SlimLM as a small language model series that can be “optimised for document assistance tasks on mobile devices.” It states that SlimLM was pre-trained on SlimPajama-627B, a “deduplicated, multi-corpora, open-source dataset” released by Cerebras in June of 2023. Lyon, who has written several guidebooks on non-fiction writing, says that some of her works were included in a pre-training dataset that Adobe used.

Lyon’s lawsuit, which Reuters initially reported on, says that her writing was included in a processed subset of a manipulated dataset that was the basis of Adobe’s program: “The SlimPajama dataset was created by copying and manipulating the RedPajama dataset (including copying Books3),” the lawsuit says. “Thus, because it is a derivative copy of the RedPajama dataset, SlimPajama contains the Books3 dataset, including the copyrighted works of Plaintiff and the Class members.”

“Books3” — a vast collection of 191,000 books used to train GenAI systems — has been a persistent source of legal trouble for the tech community. RedPajama has also been cited in several litigation cases. In September, a lawsuit against Apple claimed the company had used copyrighted material to train its Apple Intelligence model. The litigation mentioned the dataset and accused the tech company of copying protected works “without consent and without credit or compensation.” In October, a similar lawsuit against Salesforce alleged that the company had used RedPajama for training.

Unfortunately for the tech industry, such lawsuits have become somewhat commonplace. AI algorithms are trained on massive datasets, and in some cases, those datasets have allegedly included pirated materials. In September, Anthropic agreed to pay $1.5 billion to several authors who sued it, alleging it used pirated versions of their work to train its chatbot, Claude. The case was considered a potential turning point in the ongoing legal battles over copyrighted material in AI training data, which are numerous.

What's Your Reaction?

Like Like 0
Dislike Dislike 0
Love Love 0
Funny Funny 0
Angry Angry 0
Sad Sad 0
Wow Wow 0
TechAmerica.ai Staff TechAmerica.ai’s editorial team, consisting of expert editors, writers, and researchers, crafts accurate, clear, and valuable content focused on technology and education. We deliver in-depth technology news and analysis, with a special emphasis on founders and startup teams, covering funding trends, innovative startups, and entrepreneurial insights to empower our readers.