Two California District Judges Rule That Using Books to Train AI is Fair Use

Alert
|
6 min read

Two days apart, two judges in the Northern District of California decided on summary judgment that two examples of using copyrighted works to train AI models were transformative, and ultimately fair use under US copyright law. On June 23, Judge William Alsup ruled that Anthropic’s use of millions of pirated books to train the Claude LLM was “exceedingly transformative” and did not affect any relevant markets for those works. Then, on June 25, Judge Vince Chhabria held that Meta’s alleged training of Llama on “shadow libraries” was also “highly transformative,” with insufficient evidence of any adverse market effects. However, both judges observed that different economic evidence could have affected the outcome, and potential liability remains for copying and storing massive pirated libraries.

These cases join Thomson Reuters v. ROSS Intelligence, 765 F. Supp. 3d 382 (D. Del. 2025), as the primary district court decisions on whether Large Language Model (LLM) training is fair use. ROSS held that fair use does not apply, and is on interlocutory appeal to the Third Circuit.1 The differences between these cases in terms of AI models, the extent of copying and market evidence provide guidance on litigating these types of disputes.

Bartz v. Anthropic PBC, No. 24-cv-05417 (N.D. Cal. June 23, 2025)

Anthropic obtained millions of digitized books to train Claude. Many of those books originated from printed copies, which Anthropic purchased, physically disassembled and scanned. But many others were downloaded from “pirate libraries.” Upon learning this, a group of authors (pending class certification) sued Anthropic for copyright infringement. The authors do not allege that Claude generates infringing copies of their books, but that copying the books and using them to train Claude constitutes infringement.

Anthropic moved for early summary judgment on its fair use defense, arguing that none of its copying and training activities were improper. Judge Alsup addressed three accused activities under each of the fair use factors under 17 U.S.C. § 107:2  (1) training LLMs; (2) purchasing and digitizing library copies; and (3) downloading “pirated library copies.”

  • Training: Judge Alsup agreed with Anthropic that training is fair use. “In short, the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative.” He also found that the extent of Anthropic’s copying was necessary for training, and that training did not affect any potential market for the copied books or LLM licensing.
  • Library copies: The court also agreed that buying and digitizing books is transformative because “every purchased print copy was copied in order to save storage space and to enable searchability as a digital copy.” There was also no evidence that Anthropic intended to displace sales of electronic book copies. However, the court denied summary judgment on “copies made from central library copies but not used for training” because Anthropic “dodged discovery” on the extent of its copying.
  • Pirated library copies: Here, Judge Alsup ruled for the authors, denying summary judgment because Anthropic downloaded millions of copies for free and expects to retain them forever. He found that “[e]very factor points against fair use.”

Judge Alsup now expects to try infringement and damages for the pirated copies. This decision indicates that substantial risks remain for AI companies that obtain “pirated” training datasets.

Kadrey v. Meta Platforms, Inc., No. 23-cv-03417 (N.D. Cal. June 25, 2025)

Thirteen named authors of copied works sued Meta for copyright infringement. As with Anthropic, the authors alleged that Meta employed libraries of copied books to train its Llama LLMs. Meta purportedly obtained books by torrenting them from shadow libraries. After discovery closed, the authors sought partial summary judgment on infringement, and Meta cross-moved on fair use.  

Here, Judge Chhabria reached a similar outcome for different reasons, ruling that in the absence of meaningful evidence of market dilution from the authors, the copying and training were fair use. Regarding the purpose and character of the use, the court found Meta’s goal was to “train its LLMs, which are innovative tools that can be used to generate diverse text and perform a wide range of functions,” while the purpose of the original books is to “be read for entertainment or education.” Unlike Bartz, the court treated downloading and training as collectively a single transformative use. While Judge Chhabria said potential profit and bad-faith copying would be relevant, he faulted the authors’ lack of evidence.

As to effects on potential markets, the court declined to infer market harm without empirical evidence from the authors. Judge Chhabria indicated that arguing LLMs can generate works similar enough to the originals to compete with them could be persuasive. He observed that no other copyright use “has anything near the potential to flood the market with competing works the way that LLM training does.” But the court ultimately found plaintiffs’ evidence so weak that it “does not move the needle.”

Notably, Judge Chhabria cited Bartz but claimed Judge Alsup improperly focused on the transformative nature of training to the exclusion of potential market harm, “blowing off the most important factor in the fair use analysis.” Judge Chhabria also noted that “the consequences of this ruling are limited” because “these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”

The case will continue: The court separately granted summary judgment for Meta on Digital Millennium Copyright Act (DMCA) claims, and scheduled a case management conference for July 11 to discuss proceeding on the authors’ claim that Meta unlawfully distributed their books during torrenting.

Key Takeaways

  • Bartz, Kadrey, and ROSS show how much judges currently disagree about whether LLM training is fair use, particularly regarding the first and fourth factors.
  • How parties characterize AI will affect whether courts find uses “transformative.” Bartz cited generative AI’s ability to create new content to find a transformative use, while ROSS ruled that the disputed AI model was not “generative,” and therefore not transformative.
  • Strong empirical evidence of market harm or dilution will be critical in future cases. Kadrey indicated that such proof could have won for the authors. The tension between Bartz and Kadrey on the relative importance of the first and fourth fair use factors will also be an issue to watch in future cases and appeals.
  • Even if LLM training is fair use, AI companies face potential liability for unauthorized copying and distribution. The extent of that liability and any damages remain unresolved.

 1. White & Case represents ROSS Intelligence in the appeal.
 2. The nonexclusive statutory factors are: (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or the value of the copyrighted work.

White & Case means the international legal practice comprising White & Case LLP, a New York State registered limited liability partnership, White & Case LLP, a limited liability partnership incorporated under English law and all other affiliated partnerships, companies and entities.

This article is prepared for the general information of interested persons. It is not, and does not attempt to be, comprehensive in nature. Due to the general nature of its content, it should not be regarded as legal advice.

© 2025 White & Case LLP


Top