Fair use vs copyright: Anthropic’s case and its impact on AI training

HIGHLIGHTS

Landmark ruling says AI training on legally bought books is fair use, boosting Anthropic’s defense.

Anthropic still faces trial over 7 million pirated books despite fair use victory in court.

Court decision reshapes AI copyright future, separating ethical data use from unlawful content scraping.

In a groundbreaking legal decision that could redefine the boundaries of copyright law in the age of artificial intelligence, a U.S. federal court has ruled that Anthropic, the AI startup behind the Claude language model, did not infringe on copyright when it used books to train its AI as long as those books were legally acquired. The court deemed the use transformative under the doctrine of fair use.

But while Anthropic scored a major win, it’s far from being off the hook. The company still faces serious legal trouble over millions of pirated books allegedly used in the early stages of model training.

The case that could reshape AI copyright law

The lawsuit, Bartz v. Anthropic, filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of unlawfully using their copyrighted books to train its Claude models. Their complaint mirrored a growing wave of legal action by writers, artists, and publishers pushing back against the unlicensed use of their creative work by generative AI firms. At the heart of the case was the question: Can AI training on copyrighted text be considered fair use?

Also read: OpenAI faces legal heat in India, here’s why

U.S. District Judge William Alsup answered with a resounding yes, partly. He ruled that Anthropic’s use of purchased books to train Claude was “exceedingly transformative,” likening the model’s learning process to how a human might absorb and learn from literature to write something new.

A win for AI, but a warning shot too

This ruling sets a major precedent. It marks the first time a U.S. court has explicitly endorsed the idea that AI model training – when done with lawfully obtained materials – can qualify as fair use. The decision could offer a protective shield to companies like OpenAI, Meta, and Google, who face similar lawsuits over the data they used to train large language models.

For Anthropic and its peers, it’s a legal win that affirms what many in tech have argued for years: that ingesting massive datasets to teach AI isn’t the same as republishing or plagiarizing content. Rather, it’s akin to how a person might read thousands of books to understand storytelling techniques, then write something original.

Yet, the court’s decision also came with a sharp rebuke. Judge Alsup made clear that Anthropic’s alleged use of over 7 million pirated books, including titles from shadow libraries like Library Genesis, did not fall under the umbrella of fair use. He ruled that building a centralized library of stolen books was “unquestionably illegal” and ordered the case to proceed to trial on that front. If found liable, Anthropic could face damages amounting to billions of dollars.

Also read: Meta accused of using pirated books for training AI with Mark Zuckerberg’s approval

Where the line is now drawn

This nuanced judgment reflects the complexity of AI copyright issues. On one hand, the court recognized that AI training is fundamentally different from copying and distributing. On the other, it held firm that fair use doesn’t give tech companies a free pass to use stolen content.

This balance is likely to influence future court decisions and how AI companies approach model training going forward. The case arrives at a time when AI development is racing ahead, often faster than the legal frameworks meant to govern it. Most copyright laws, including the U.S. fair use doctrine, were written before the internet, let alone generative AI, became part of daily life.

This ruling may accelerate a push toward clearer guidelines for AI training. For now, it gives AI companies firmer legal ground if they can prove their data was lawfully sourced and their models don’t reproduce copyrighted works verbatim.

In practical terms, AI developers are likely to audit their training data sources more thoroughly, avoid unlicensed or pirated material and pursue formal licensing deals with publishers and authors

At the same time, creators and rights holders are likely to continue pressing for compensation and more robust protection. Several authors’ groups and publishers have already called for legislative updates to copyright law in light of AI’s transformative nature.

A dividing line in the AI copyright wars

The court’s decision in favor of Anthropic has sparked both celebration and concern. Tech advocates hail it as a win for innovation, creativity, and the open sharing of knowledge. Critics, however, worry that it opens the door to corporations profiting from creative work without compensation.

With a trial on Anthropic’s alleged use of pirated books scheduled for later this year, the story is far from over. But for now, the decision stands as a pivotal moment in the legal story of AI, one that sets new precedent and forces everyone, from engineers to lawmakers, to rethink the rules of creativity in the algorithmic age.

Also read: ChatGPT 5: The DOs and DON’Ts of AI training according to OpenAI

Vyom Ramani

A journalist with a soft spot for tech, games, and things that go beep. While waiting for a delayed metro or rebooting his brain, you’ll find him solving Rubik’s Cubes, bingeing F1, or hunting for the next great snack.