Dictionary publisher files lawsuit against OpenAI
A major dictionary publisher has filed a lawsuit against OpenAI, raising concerns over copyright use, AI training data, and how language models source definitions.
Encyclopedia Britannica and Merriam-Webster have initiated legal action against OpenAI, claiming in their complaint that the company has engaged in “massive copyright infringement.”
Britannica, which owns Merriam-Webster, states that it holds the rights to nearly 100,000 online articles. According to the lawsuit, this content was scraped and used to train OpenAI’s large language models without authorisation.
The publisher further alleges that OpenAI has breached copyright laws by producing outputs that include “full or partial verbatim reproductions” of its material. It also claims that OpenAI incorporates its articles into ChatGPT’s retrieval-augmented generation (RAG) system. In this process, the model pulls in up-to-date information from external sources when generating responses.
In addition to copyright claims, Britannica accuses OpenAI of violating the Lanham Act, a U.S. trademark law, by generating inaccurate or fabricated information and incorrectly attributing it to Britannica.
The lawsuit argues that ChatGPT undermines publishers by delivering responses that effectively replace their content. “ChatGPT starves web publishers like [Britannica] of revenue by generating responses to users’ queries that substitute, and directly compete with, the content from publishers like [Britannica],” the complaint states. It also raises concerns that hallucinated outputs could threaten “the public’s continued access to high-quality and trustworthy online information.”
Britannica’s legal action adds to a growing number of cases brought against OpenAI by publishers and content creators over copyright concerns. Organisations including The New York Times, Ziff Davis, which owns outlets such as Mashable, CNET, IGN, and PCMag, and more than a dozen newspapers across the United States and Canada have also filed lawsuits. Among them are the Chicago Tribune, the Denver Post, the Sun Sentinel, the Toronto Star, and the Canadian Broadcasting Corporation.
Separately, Britannica has filed a similar lawsuit against Perplexity, which remains unresolved.
The legal landscape around AI training data remains uncertain. There is currently no definitive precedent determining whether using copyrighted material to train large language models constitutes infringement. However, in a notable case, Anthropic successfully argued before federal judge William Alsup that using such material for training purposes could be considered transformative and therefore permissible. At the same time, the judge ruled that Anthropic had violated the law by downloading millions of books without proper authorisation, resulting in a $1.5 billion class-action settlement for affected authors.
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0