February 23, 2025
Intangible Assets

Use of Copyrighted Works in AI Training Is Not Fair Use: Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. | Carlton Fields


With the Trump administration’s push to establish America’s global dominance in artificial intelligence (AI), thorny questions of intellectual property rights and fair use are likely to be litigated with greater frequency. The U.S. District Court for the District of Delaware’s recent ruling in Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. that use of copyrighted works to train AI models does not fall under the fair use exception serves as a cautionary note on the use of data to train AI balanced against third-party IP rights.

Background

Thomson Reuters owns Westlaw, one of the biggest legal research platforms in the world. Users pay to access Westlaw content such as case law, statutes, state and federal regulations, and law journals and treatises. This type of content is not copyrightable. However, Thomson Reuters owns copyrights to Westlaw’s editorial content and annotations, such as “headnotes,” which summarize key points of law and case holdings, and their proprietary “Key Number System,” a numerical taxonomy for organizing Westlaw content.

Ross, a competitor, developed an AI-powered search engine to search legal content. Ross first sought to license Westlaw’s content but was denied. Ross then hired LegalEase to develop training data in the form of “Bulk Memos,” which were lawyers’ compilations of legal questions and answers. Lawyers compiling the Bulk Memos created the questions using Westlaw headnotes but received specific instructions not to copy and paste Westlaw’s headnotes into the Bulk Memos. Ross used roughly 25,000 of these Bulk Memos to train its AI. Thomson Reuters sued Ross when it discovered the Bulk Memos’ reliance on Westlaw headnotes, especially where the memos contained language similar to Westlaw headnotes.

Infringement and Fair Use Ruling

In 2023, the court largely denied motions for summary judgment by both parties, but subsequently invited the parties to renew their summary judgment briefing. Based on the new briefing, the court granted Thomson Reuters’ motion for summary judgment of direct infringement with respect to 2,243 headnotes, while leaving for trial disputes as to thousands more headnotes, additional editorial content, and Key Number System. The court rejected Ross’ asserted defenses, including its fair use defense.

On infringement, the court found that Thomson Reuters owned valid copyrights, as the threshold for “originality” was “extremely low,” requiring only “some minimal degree of creativity,” that could be met by distilling judicial opinions into headnotes. Further, the court found that the Key Number System was sufficiently original to be protectable by copyright. The court also found that Thomson Reuters had shown an actual copying of Westlaw headnotes and substantial similarity between the headnotes and the Bulk Memos used as training data. The court independently compared 2,830 Bulk Memo questions with the corresponding headnotes and judicial opinions and found strong circumstantial evidence of copying of 2,243 headnotes, noting specifically the language of the Bulk Memos that closely tracked the headnotes, rather than the language of the case opinion.

The court rejected Ross’ defenses of innocent infringement, copyright misuse, merger, and scenes à faire.  The court considered four factors outlined in 17 U.S.C. § 107(1)–(4) in assessing Ross’ fair use defense and ultimately rejected the defense, holding that factors 1 and 4 favored Thomson Reuters:

  1. Purpose and Character of the Use: The court found for Thomson Reuters. Ross’ use was commercial, and not transformative. Ross used the headnotes “as AI data to create a legal research tool to compete with Westlaw.” Ross’ AI “is not generative AI (AI that writes new content itself).” “Rather, when a user enters a legal question, Ross spits back relevant judicial opinions that have already been written.” It did not make a difference that Ross’ alleged copying occurred at an intermediate step where Ross “turned the headnotes into numerical data about the relationships among legal words to feed into its AI.” While courts have found that such intermediate copying could be fair use when copying computer code, that fair use allowance did not apply in this instance because “there is no computer code whose underlying ideas can be reached only by copying their expression.” Rather, Ross “took the headnotes to make it easier to develop a competing legal research tool.”
  2. Nature of the Copyrighted Works: The court found for Ross on this factor as the headnotes and Key Number System reflected limited creativity.
  3. Amount and Substantiality of the Portion Used: The court found for Ross on this factor because “Ross did not make West headnotes available to the public.”
  4. Effect on the Market/Value of Copyrighted Works: The court found for Thomson Reuters on this factor because Ross “meant to compete with Westlaw by developing a market substitute” for Thomson Reuters’ legal research platform and derivative market for data to train legal AIs.

Key Takeaways

AI case law is still in its infancy, and multiple cases are winding their way through courts in various jurisdictions. Further, it is likely that appellate courts, and the U.S. Supreme Court, will ultimately weigh in on the boundaries of copyright law in AI training models. However, this lower court decision has important implications for companies using data to train their AI models.

  • Fair Use Defenses: Case law on fair use defenses in the use of computer code is fairly well developed, but as the court in Thomson Reuters pointed out, the data in training AI is not computer code and its use fulfils a different purpose. Thus, to the extent that AI companies are relying on the historical case law on fair use of computer codes, they should evaluate the applicability of such defenses where copyrighted material is being used to train their AI models in a paradigm that is distinct from the use of computer code.
  • Creativity of Underlying Copyrighted Materials: The court found that the standard for “originality” was extremely low. And while factor 2 was in favor of Ross, even the finding of de minimis originality was sufficient to support Thomson Reuters’ claim. AI developers may find this factor to weigh against them when the creativity of the content is less in dispute (e.g., where the asserted works are photographs, novels, moving images, etc.).
  • Source of the Training Data: Here, Ross sought to avoid incorporating the copyrightable aspects of Westlaw’s headnote and Key Number System by hiring LegalEase to prepare Bulk Memos, where lawyers were expressly instructed not to copy the language of the headnotes. Nonetheless, the court found substantial evidence of copying based on an examination of the actual contents. Here, the underlying judicial opinions were not copyrightable, and the question remains as to what level of difference between the headnotes and Bulk Memo language could have precluded a finding of infringement, where both summarized the same judicial opinion.
  • Liability for Infringement: The decision is silent on any allocation of liability between LegalEase and Ross for copyright infringement. However, both data vendors and AI developers should review their agreements to evaluate allocation of liability for copyright infringement, the level of control that the AI developer should exercise over the development of the training data set, and the representations and warranties that should be provided by data vendors and AI developers respectively.
  • Applicability to Generative AI: The court took pains to note that Ross’ AI tool did not involve generative AI, and the question of the use of copyrighted materials to train generative AI is pending in other courts, including in The New York Times Co. v. Microsoft Corp. The U.S. District Court for the Southern District of New York recently heard the defendants’ motion to dismiss, and its decision is likely to shed additional light on the use of the copyrighted material on training AI models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *