AI Training and Fair Use
A few weeks ago, Anthropic, the developer of Claude, won a fair use trial regarding the use of legally obtained copies of books that were digitized for AI training and the creation of a digital library for the purposes of training LLMs. Two days later, Meta won a similar case, but US District Judge Vince Chhabria did not mince words when criticizing companies training LLMs on data without a license to do so. However, he ultimately conceded that the authors who were suing Meta "made the wrong arguments and failed to develop a record in support of the right one", leading to a judgement in Meta's favor.
The judges were in agreement: AI training on legally acquired books was substantially transformative to jump over the first hurdle of qualifying as fair use. As summarized in the introduction of the judgement by Judge Chhabria,
[Meta is] using the works in a way that’s highly creative in its own right. In the language of copyright law, the companies’ use of the works is “transformative.” As a factual matter, there’s no disputing that. And as a legal matter, it’s true that you’re less likely to be liable for copyright infringement if you’re copying the work for a transformative purpose. In that situation, you’re more likely to be protected by the fair use doctrine.
To qualify as fair use, not only does the use of copyrighted materials need to be transformative, it also can't meaningfully harm the market of the protected work. Judge Chhabria continues,
There is certainly no rule that when your use of a protected work is “transformative,” this automatically inoculates you from a claim of copyright infringement. And here, copying the protected works, however transformative, involves the creation of a product with the ability to severely harm the market for the works being copied, and thus severely undermine the incentive for human beings to create. Under the fair use doctrine, harm to the market for the copyrighted work is more important than the purpose for which the copies are made.
The way I like to think about this is the classic example of watching a sports game at a bar. Is it transformative to watch the Super Bowl or College World Series Final at a bar if all that bar is doing is showing the game on the TV?
Yes!! You're watching the game at that bar and everyone around is cheering for your team with you. That's inherently a different experience than watching the game at home on your couch, and for a lot of people, a better one. So then why do restaurants and bars have to pay way more to stream a game than you would at home? It's because the bar is selling access to something that the consumer would otherwise buy on their own (at least in theory). Does this mean that it's illegal for bars or restaurants to stream sports games? No. They just need to buy a license to play the game in a commercial venue. And those licenses cost a whole lot more[1].
In the case involving Anthropic, US District Judge William Alsup stated that the creation of Claude did not displace the demand for the author's copyrighted work. Judge Alsup likens the training of LLMs to the education of school children, stating,
But Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act.
I am not a lawyer, but I certainly do not believe that using millions of books to train an LLM to produce a product worth billions or even trillions of dollars is the same as teaching kids to write. Judge Chhabria certainly agrees, stating in his critique of Judge Alsup's ruling,
[When] it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.
However, Judge Chhabria ultimately ruled that the authors suing Meta did not present sufficient evidence that Meta was harming the market of book sales, stating,
But courts can’t decide cases based on what they think will or should happen in other cases. They must decide cases based on the arguments presented and the evidence submitted by the parties. The question, then, is whether these particular thirteen plaintiffs in this particular case have presented enough evidence to win on this factor [...]. The answer is no.
While both of these cases result in summary judgement stating that the companies are not infringing on the copyright of the authors who are suing them, I don't think this will be true for all future cases. Judge Chhabria writes,
Of course, not all copyrighted works would have their markets diluted equally by AI-generated competitors. It seems unlikely, for instance, that AI-generated books would meaningfully siphon sales away from well-known authors who sell books to people looking for books by those particular authors. But it’s easy to imagine that AI-generated books could successfully crowd out lesser-known works or works by up-and-coming authors. While AI-generated books probably wouldn’t have much of an effect on the market for the works of Agatha Christie, they could very well prevent the next Agatha Christie from getting noticed or selling enough books to keep writing.
Additionally, the plaintiffs in both of these cases were authors of books, and AI models have significantly more potential to harm other markets. For example, if app icons created by The Icon Factory were used to train an LLM, and the developer of that LLM provides app icon creation as a service for users, then that's a problem. While I have no knowledge of the use of The Icon Factory's icons for the training of any specific LLMs, Sean Heber of The Icon Factory recently stated that AI services are harming the market of app icon creation, and that sucks. It would suck even worse if they were doing so by stealing their intellectual property. [2]
Disclaimer: I am not a lawyer.
Note the "TV Access Fee" in fine print ↩︎
Sean specifically mentions ChatGPT in his post, and I'll reiterate that I don't know if ChatGPT was trained on the app icons created by The Icon Factory. But given that ChatGPT was trained on the open internet, I wouldn't be surprised. ↩︎