- cross-posted to:
- [email protected]
- cross-posted to:
- [email protected]
Yesterday, popular authors including John Grisham, Jonathan Franzen, George R.R. Martin, Jodi Picoult, and George Saunders joined the Authors Guild in suing OpenAI, alleging that training the companyās large language models (LLMs) used to power AI tools like ChatGPT on pirated versions of their books violates copyright laws and is āsystematic theft on a mass scale.ā
āGenerative AI is a vast new field for Silicon Valleyās longstanding exploitation of content providers," Franzen said in a statement provided to Ars. "Authors should have the right to decide when their works are used to ātrainā AI. If they choose to opt in, they should be appropriately compensated.ā
OpenAI has previously argued against two lawsuits filed earlier this year by authors making similar claims that authors suing āmisconceive the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence.ā
This latest complaint argued that OpenAIās āLLMs endanger fiction writersā ability to make a living, in that the LLMs allow anyone to generateāautomatically and freely (or very cheaply)ātexts that they would otherwise pay writers to create.ā
Authors are also concerned that the LLMs fuel AI tools that ācan spit out derivative works: material that is based on, mimics, summarizes, or paraphrasesā their works, allegedly turning their works into āengines ofā authorsā āown destructionā by harming the book market for them. Even worse, the complaint alleged, businesses are being built around opportunities to create allegedly derivative works:
Businesses are sprouting up to sell prompts that allow users to enter the world of an authorās books and create derivative stories within that world. For example, a business called Socialdraft offers long prompts that lead ChatGPT to engage in āconversationsā with popular fiction authors like Plaintiff Grisham, Plaintiff Martin, Margaret Atwood, Dan Brown, and others about their works, as well as prompts that promise to help customers āCraft Bestselling Books with AI.ā
They claimed that OpenAI could have trained their LLMs exclusively on works in the public domain or paid authors āa reasonable licensing feeā but chose not to. Authors feel that without their copyrighted works, OpenAI āwould have no commercial product with which to damageāif not usurpāthe market for these professional authorsā works.ā
āThere is nothing fair about this,ā the authorsā complaint said.
Their complaint noted that OpenAI chief executive Sam Altman claims that he shares their concerns, telling Congress that "creators deserve control over how their creations are usedā and deserve to ābenefit from this technology.ā But, the claim adds, so far, Altman and OpenAIāwhich, claimants allege, āintend to earn billions of dollarsā from their LLMsāhave āproved unwilling to turn these words into actions.ā
Saunders said that the lawsuitāwhich is a proposed class action estimated to include tens of thousands of authors, some of multiple works, where OpenAI could owe $150,000 per infringed workāwas an āeffort to nudge the tech world to make good on its frequent declarations that it is on the side of creativity.ā He also said that stakes went beyond protecting authorsā works.
āWriters should be fairly compensated for their work,ā Saunders said. "Fair compensation means that a personās work is valued, plain and simple. This, in turn, tells the culture what to think of that work and the people who do it. And the work of the writerāthe human imagination, struggling with reality, trying to discern virtue and responsibility within itāis essential to a functioning democracy.ā
The authorsā complaint said that as more writers have reported being replaced by AI content-writing tools, more authors feel entitled to compensation from OpenAI. The Authors Guild told the court that 90 percent of authors responding to an internal survey from March 2023 ābelieve that writers should be compensated for the use of their work in ātrainingā AI.ā On top of this, there are other threats, their complaint said, including that āChatGPT is being used to generate low-quality ebooks, impersonating authors, and displacing human-authored books.ā
Authors claimed that despite Altmanās public support for creators, OpenAI is intentionally harming creators, noting that OpenAI has admitted to training LLMs on copyrighted works and claiming that thereās evidence that OpenAIās LLMs āingestedā their books āin their entireties.ā
āUntil very recently, ChatGPT could be prompted to return quotations of text from copyrighted books with a good degree of accuracy,ā the complaint said. āNow, however, ChatGPT generally responds to such prompts with the statement, āI canāt provide verbatim excerpts from copyrighted texts.āā
To authors, this suggests that OpenAI is exercising more caution in the face of authorsā growing complaints, perhaps since authors have alleged that the LLMs were trained on pirated copies of their books. Theyāve accused OpenAI of being āopaqueā and refusing to discuss the sources of their LLMsā data sets.
Authors have demanded a jury trial and asked a US district court in New York for a permanent injunction to prevent OpenAIās alleged copyright infringement, claiming that if OpenAIās LLMs continue to illegally leverage their works, they will lose licensing opportunities and risk being usurped in the book market.
Ars could not immediately reach OpenAI for comment. [Update: OpenAIās spokesperson told Ars that ācreative professionals around the world use ChatGPT as a part of their creative process. We respect the rights of writers and authors, and believe they should benefit from AI technology. Weāre having productive conversations with many creators around the world, including the Authors Guild, and have been working cooperatively to understand and discuss their concerns about AI. Weāre optimistic we will continue to find mutually beneficial ways to work together to help people utilize new technology in a rich content ecosystem.ā]
Rachel Geman, a partner with Lieff Cabraser and co-counsel for the authors, said that OpenAIās "decision to copy authorsā works, done without offering any choices or providing any compensation, threatens the role and livelihood of writers as a whole.ā She told Ars that "this is in no way a case against technology. This is a case against a corporation to vindicate the important rights of writers.ā
Yāall are missing the point, what you said is about AI output and is not the main issue in the lawsuit. The lawsuit is about the input to AI - authors want to choose if their content may be used to train AI or not (and if yes, be compensated for it).
There is an analogy elsewhere in this thread that is pretty apt - this scenario is akin to an university using pirated textbooks to educate their students. Whether or not the student ended up pursing a field that uses the knowledge does not matter - the issue is the university should not have done so in the first place (and remember, the university is not only profiting off of this but also saving money by shafting the authors).
I imagine that the easiest way to acquire specific training data for a LLM is to download EBooks from amazon. If a university professor pirates a textbook and then uses extracts from various pages in their lecture slides, the cost of the crime would be the cost of a single textbook. In the case of a novel, GRRM should be entitled to the cost of a set of Ice & Fire if they could prove that the original training material was illegaly pirated instead of legally purchased.
Once a copy of a book is sold, an author typically has no say in how it gets used outside of reproduction.
Iām not sure your assessment of the ācost of damagesā is really accurate but again, thatās not the point.
The point of the lawsuit is about control. If the authors succeed in setting precedent that they should control the use of their work in AI training, then they can easily negotiate the terms with AI tech companies for much more money.
The point of the lawsuit is not one-time compensation, itās about the control in the future.