Is training AI on copyrighted works automatically theft?

Not automatically. Whether AI training constitutes copyright infringement depends on the legal framework that applies. In the US, defendants may invoke the fair use defence under 17 U.S.C. § 107. In the EU, training may be lawful if it falls within the text and data mining exceptions of the DSM Directive, provided the rights-holder has not opted out. The Elsevier v. Meta complaint filed on 5 May 2026 presents the publishers’ theory that large-scale, commercial training without a licence is infringement; courts have not yet ruled on this specific case.

What is the Elsevier v. Meta lawsuit about?

Filed on 5 May 2026 in the Southern District of New York (docket no. 1:26-cv-03689), the case alleges that Meta scraped and ingested millions of copyrighted scientific articles, textbooks and literary works to train its generative AI models, without authorisation or payment. The complaint asserts claims for direct copyright infringement, creation of unlicensed derivative works and wilful infringement.

Does the EU AI Act change copyright law?

The EU AI Act does not replace or amend existing copyright legislation. However, it creates new transparency obligations for providers of general-purpose AI models, including a requirement to publish a summary of training content and to implement policies that comply with EU copyright law, including the TDM opt-out mechanism. These obligations create additional compliance touchpoints that interact directly with the DSM Directive’s copyright framework.

Can companies rely on Text and Data Mining exceptions to train AI models?

Companies can rely on the TDM exception in Article 4 of the DSM Directive (implemented as § 44b UrhG in Germany) only if two conditions are met: the works were lawfully accessed, and the rights-holder has not reserved their rights via a machine-readable opt-out notice. Where an opt-out exists, a licence is required. The narrower research exception in Article 3 (§ 60d UrhG) is limited to research organisations and cultural heritage institutions.

What steps should a company take immediately after these lawsuits?

Immediate priorities include: pausing training runs that use datasets of uncertain provenance; auditing all existing training data for licensing status and TDM opt-out compliance; obtaining or strengthening vendor warranties and indemnities; documenting TDM reliance with timestamps and evidence; and engaging specialist IP counsel for a formal risk assessment.

How do US and EU remedies differ for copyright claims related to AI?

In the US, rights-holders can seek statutory damages of up to US $150,000 per work for wilful infringement, alongside injunctions and disgorgement of profits. In Germany, compensatory damages under §§ 97–97a UrhG are typically calculated by reference to a hypothetical licence fee or the infringer’s profits. Injunctive relief is available in both jurisdictions, but EU enforcement is more likely to involve collective licensing solutions. Cross-border enforcement adds complexity, particularly where models are trained in one jurisdiction and deployed in another.

Will redaction or anonymisation of training data avoid liability?

Not necessarily. Copyright infringement typically occurs at the point of reproduction, when the work is copied into the training pipeline, regardless of whether the final model output can reproduce the original work verbatim. Redacting identifying information from training data does not retroactively cure an unlicensed reproduction. Moreover, if a model can generate outputs that are substantially similar to copyrighted works, additional infringement claims may arise at the output stage.

Where can I find primary documents for the Elsevier v. Meta case?

The full complaint is available as a PDF from the Association of American Publishers at publishers.org. The docket record, including filing dates and procedural updates, can be accessed via CourtListener under docket number 73294740 (Elsevier Inc. v. Meta Platforms, Inc., 1:26-cv-03689, S.D.N.Y.).

Ai Copyright Line Between Training Theft

Last updated: 25 May 2026

On 5 May 2026, a coalition of major academic and trade publishers, led by Elsevier, filed a landmark copyright complaint against Meta Platforms in the Southern District of New York, alleging that the tech giant systematically scraped and ingested millions of copyrighted works to train its large language models. The case, docketed as Elsevier Inc. v. Meta Platforms, Inc. , No. 1:26-cv-03689 (S. D. N. Y. ), sharpens a question that has been building across courtrooms and legislatures worldwide: where exactly does the line between AI copyright training and outright theft fall?

For German businesses developing, deploying or procuring AI systems, the answer depends on a complex interplay between US fair use doctrine, the EU AI Act’s transparency obligations and the text and data mining (TDM) exceptions embedded in Germany’s own Urheberrechtsgesetz (UrhG). This article analyses the latest litigation, maps the diverging legal tests on both sides of the Atlantic and provides a practical compliance playbook for companies that need to act now.

Latest Litigation Snapshot: Elsevier & Others v. Meta

The complaint filed on 5 May 2026 represents one of the largest coordinated actions by the publishing industry against an AI developer. According to the filing published by the Association of American Publishers, a group of publishers including Elsevier, Wiley and other prominent rights-holders allege that Meta copied vast quantities of copyrighted scientific articles, textbooks and literary works, without licence or authorisation, to build and refine its family of generative AI models.

Press coverage from Nature and The Guardian has underscored the scale of the allegations: the complaint describes an industrial data-ingestion pipeline that, plaintiffs contend, treated copyrighted catalogues as freely available raw material. The docket record on CourtListener confirms the filing date and the Southern District of New York as the venue, placing the case squarely in one of the busiest federal courts for intellectual property disputes.

The significance for German and European businesses is immediate. Any organisation that licenses Meta’s models, fine-tunes them on proprietary corpora, or incorporates their outputs into commercial products may inherit downstream exposure if courts ultimately find that the underlying training constituted AI training copyright infringement.

What the Complaint Alleges, Key Legal Theories

The publishers’ complaint advances several interlocking theories of liability. First, it asserts direct copyright infringement through the mass reproduction of protected works during the ingestion and pre-processing stages of model training. Second, it alleges that Meta’s conduct amounts to the creation of unlicensed derivative works, a particularly significant claim because it targets the model weights themselves, not just the training process. Third, the complaint contends that Meta’s actions are wilful, pointing to internal awareness of copyright restrictions and the deliberate circumvention of access controls. Finally, the publishers argue that Meta’s AI products compete with and displace the established licensing markets for scientific and educational content, causing direct market harm.

US Position: Fair Use, Ongoing Cases and What Courts Are Testing

The legal defence that underpins virtually every AI-training dispute in the United States is the fair use doctrine codified in 17 U.S.C. § 107. Courts apply a four-factor balancing test that considers: (1) the purpose and character of the use, including whether it is transformative; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the potential market for the original work.

Fair use AI training defences have been central to earlier cases brought by authors, visual artists and news organisations. Industry observers expect the Elsevier complaint to test fair use at a new scale, because the sheer volume of reproduced works and the commercial nature of Meta’s AI products may weaken the “transformative purpose” argument that prevailed in prior search-engine and data-analytics rulings. Courts have historically treated wholesale copying more sceptically than selective quotation, and the publishers are positioning their claims to exploit that distinction.

For multinational businesses operating from Germany, US litigation outcomes matter even when the infringing conduct occurs on American servers. Cross-border copyright enforcement mechanisms mean that injunctions issued by US courts can restrict global model distribution, and contractual indemnities in AI vendor agreements may not extend to acts found to be wilful infringement. The likely practical effect will be heightened due-diligence obligations for any entity in the AI supply chain, regardless of where it is incorporated.

Furthermore, the Elsevier case does not exist in isolation. It joins a growing roster of US training-data disputes, including actions brought by visual-content licensing companies, authors’ guilds and news publishers, each of which is probing a different facet of the fair use boundary. Taken together, these cases are building a body of judicial reasoning that will shape global norms for years to come.

EU Framework: EU AI Act, Copyright Law and TDM Exceptions

The European approach to AI copyright sits on a different legal foundation from the US fair use model. Rather than an open-ended balancing test, the EU relies on a system of specific statutory exceptions, the most important of which, for AI training purposes, are the text and data mining provisions introduced by Directive (EU) 2019/790 on Copyright in the Digital Single Market (the DSM Directive).

Article 3 of the DSM Directive permits text and data mining of lawfully accessed works for the purposes of scientific research by research organisations and cultural heritage institutions. Article 4 creates a broader TDM exception available to any person or entity, but subject to a critical limitation: rights-holders may expressly reserve their rights and opt out of text and data mining by means of machine-readable notices. Where an opt-out has been communicated, the exception does not apply and a licence is required.

The EU AI Act, which entered into force in stages from 2024 and whose general-purpose AI model provisions became applicable in 2025, adds a further layer. The Act does not replace copyright law, but it imposes specific transparency obligations on providers of general-purpose AI (GPAI) models. Providers must draw up and make publicly available a sufficiently detailed summary of the content used for training, and they must put in place a policy to comply with EU copyright law, including the TDM opt-out mechanism. These obligations are designed to give rights-holders the information they need to monitor and enforce their rights.

The interaction between EU AI Act copyright provisions and the DSM Directive’s TDM exceptions creates a dual compliance requirement: AI developers must not only ensure they qualify for a TDM exception (or hold a licence), but also demonstrate that they have robust systems to detect and honour opt-out notices. Failure on either front creates independent grounds for liability.

Germany-Specific Points: Implementation and Likely Interpretation

Germany transposed the DSM Directive’s TDM provisions into its national law through amendments to the Urheberrechtsgesetz (UrhG). Sections 44b and 60d UrhG implement, respectively, the general TDM exception and the research-specific TDM exception. Section 44b UrhG permits reproductions of lawfully accessible works for the purposes of text and data mining, but, mirroring Article 4 of the DSM Directive, only where the rights-holder has not reserved their rights in a machine-readable format.

German courts have historically taken a strong view of authors’ moral and economic rights, and early indications suggest that judicial interpretation of the TDM opt-out mechanism will be rigorous. Industry observers expect German regulators and courts to require clear, contemporaneous documentation that an AI developer checked for opt-out notices before ingesting any corpus. For businesses operating under Germany copyright law and building or deploying AI, the practical message is unambiguous: reliance on TDM exceptions demands proactive, auditable record-keeping at every stage of the data pipeline.

Comparing US and EU Legal Tests: The AI Copyright Line Between Training and Theft

The divergence between the US and EU frameworks means that conduct which may be defensible on one side of the Atlantic can create serious liability on the other. The following comparison table summarises the key differences that German businesses and their cross-border legal teams need to understand.

Topic	US (Fair Use Approach)	EU / Germany (TDM & Copyright Approach)
Legal test for training on copyrighted works	Four-factor fair use analysis (purpose, nature, amount, market effect); courts assess whether the use is transformative.	Exception-based: TDM exceptions (Articles 3–4 DSM Directive; §§ 44b, 60d UrhG) allow reproductions for text and data mining subject to conditions; general copyright protection remains intact.
Common defences used by AI developers	Fair use; implied licence (rare); de minimis / technical reproduction defences.	Reliance on TDM exceptions; contractual licences; compliance with EU AI Act transparency obligations.
Opt-out mechanism	No statutory opt-out; rights-holders must litigate to enforce.	Machine-readable opt-out (Article 4 DSM Directive / § 44b(3) UrhG); once reserved, no TDM exception applies.
Remedies & likely outcomes	Injunctions, statutory damages up to US $150,000 per work (wilful), disgorgement, settlements, high uncertainty; case-by-case.	Injunctions, compensatory damages under national law (§§ 97–97a UrhG), criminal liability for commercial infringement; licensing and collective bargaining solutions more likely in practice.

The table illustrates a fundamental structural difference: the US system places the burden on defendants to prove fair use after litigation has begun, while the EU system provides a defined safe harbour that companies can plan around in advance, but only if they actively comply with opt-out requirements and maintain proper documentation.

Practical Exposure and Risk for Companies Building or Deploying AI

Understanding the theoretical legal tests is necessary, but what German businesses need most urgently is a clear assessment of practical exposure. The risks of training AI on copyrighted data without permission are substantial, multi-dimensional and not limited to direct model developers.

Injunctive relief. Courts in both the US and Germany can order the cessation of infringing activity. In extreme cases, this could mean a court ordering a model to be retrained or withdrawn from the market, a potentially catastrophic business disruption.
Financial damages. In the US, statutory damages for wilful infringement can reach US $150,000 per work. Under the German UrhG, compensatory damages are calculated by reference to a hypothetical licence fee, lost profits or the infringer’s profits, any of which can be substantial when multiplied across thousands of works.
Supply-chain liability. Companies that deploy or fine-tune models built on unlicensed data may face claims even if they did not perform the initial scraping. Contractual indemnities from AI vendors are only as strong as the vendor’s solvency and the enforceability of the clause in the relevant jurisdiction.
Reputational damage. High-profile litigation, such as the Elsevier v. Meta complaint, generates public attention that can erode customer trust and invite regulatory scrutiny, particularly in sectors like scientific publishing, education and media.
Licensing-market disruption. Rights-holders are increasingly arguing that AI training displaces established licensing markets. If courts agree, the availability of voluntary licences may narrow and the cost of retrospective licensing could rise sharply.

Real-World Examples and Hypotheticals

Consider a German ed-tech company that fine-tunes a large language model on a corpus of scientific journal articles obtained via an API. If the publisher has placed a machine-readable TDM opt-out on its platform, the company cannot rely on § 44b UrhG and has no statutory defence, the fine-tuning constitutes unlicensed reproduction. Alternatively, a design firm that uses a generative image model to produce marketing materials may discover that the underlying model was trained on copyrighted photographs. Even though the firm did not train the model, its commercial use of infringing outputs could expose it to claims under both US and German law.

Early indications suggest that cross-border copyright enforcement in cases involving AI-generated outputs will become an increasingly active area of litigation.

Compliance Playbook: Sourcing, Documenting and Licensing Training Data

The most effective response to the current legal uncertainty is to build compliance into the data pipeline from the outset. The following seven-point checklist provides a practical framework for avoiding AI training copyright infringement and documenting lawful data use.

Map your data provenance. For every dataset used in training, fine-tuning or evaluation, maintain a written record of: the source, the date of access, the licence terms, and whether the rights-holder has posted a TDM opt-out notice.
Check for machine-readable opt-outs. Before ingesting any corpus, run automated checks for robots.txt directives, TDM reservation headers and any other machine-readable rights reservations. Log the results with timestamps.
Obtain vendor warranties. If you procure training data from third parties, require contractual warranties that the data has been lawfully obtained, that no TDM opt-outs have been overridden, and that the vendor will indemnify you against infringement claims.
Secure explicit licences where required. Where a rights-holder has opted out of TDM or where data falls outside any statutory exception, negotiate a training data licensing agreement. Document the scope of permitted use, including model deployment, fine-tuning and output generation.
Implement access controls. Restrict who within your organisation can add data to training pipelines. Require sign-off from legal or compliance before any new corpus is ingested.
Prepare an EU AI Act compliance summary. If you are a provider of a general-purpose AI model, draft the required public summary of training content in accordance with the EU AI Act’s transparency obligations.
Establish a retention and deletion policy. Define how long training data and intermediate copies are retained, and ensure you can demonstrate deletion of data that turns out to be unlicensed or subject to an opt-out.

In addition, consider including the following sample clause in your AI vendor agreements:

“The Supplier warrants that all training data used in the development of the Model has been obtained in compliance with applicable copyright laws, including EU Directive 2019/790 and the German Urheberrechtsgesetz (UrhG), and that no data subject to a machine-readable TDM opt-out has been used without an express licence from the rights-holder.”

A robust copyright risk audit for AI should be conducted at least annually, and immediately following any significant change to the training pipeline, the regulatory environment, or the case law landscape.

Monitoring, Detection and Response

Compliance does not end once a model is trained. Ongoing monitoring is essential to identify potential infringement in model outputs and to respond swiftly when issues arise.

Output monitoring tools. Deploy automated systems that scan model outputs for near-verbatim reproduction of known copyrighted works. Several commercial and open-source tools now offer similarity-matching against large reference databases.
Takedown and correction protocols. Establish an internal protocol for responding to infringement claims: designate a responsible officer, define escalation paths, and prepare template responses for rights-holder complaints.
Insurance. Investigate whether your existing professional indemnity or media liability insurance covers AI-related copyright claims. Where gaps exist, consider specialist AI liability coverage, a market that industry observers expect to expand rapidly in 2026 and beyond.
Periodic re-audit. As case law develops and new opt-out notices are published, revisit your training data records and refresh your compliance assessment.

What to Do Now, Checklist and Next Steps for German Businesses

The filing of the Elsevier v. Meta complaint on 5 May 2026 should serve as an immediate catalyst for action. The following executive checklist summarises the priority steps for board-level decision-makers and in-house legal teams.

Pause high-risk training runs that rely on datasets of uncertain provenance until a formal copyright risk audit for AI has been completed.
Audit existing datasets against the seven-point compliance checklist above and remediate any gaps in documentation or licensing.
Review AI vendor contracts for indemnification scope, warranty language and termination rights in the event of upstream infringement findings.
Brief your board and senior leadership on the cross-border copyright enforcement risks arising from US litigation and the parallel EU AI Act obligations.
Engage specialist IP counsel to conduct a bespoke assessment of your organisation’s exposure under both German and US law.
Monitor docket developments in Elsevier Inc. v. Meta Platforms, Inc. (1:26-cv-03689, S.D.N.Y.) and related cases for procedural milestones that could affect your risk profile.

Conclusion

The line between AI copyright training and theft is not drawn by technology, it is drawn by law, and the law is moving fast on both sides of the Atlantic. German businesses that act now to map their data provenance, honour TDM opt-outs, secure proper licences and build auditable compliance processes will be best positioned to navigate the uncertainty ahead. Those that delay risk finding themselves on the wrong side of the line.

Need Legal Advice?

This article was produced by Global Law Experts. For specialist advice on this topic, contact Markus Koerner at Bird & Bird, a member of the Global Law Experts network.

Home

Global Law Experts

Search

Find a Global Law Expert

Digital

Marketing & Lead Generation

Practice Areas

Top Legal Advice

Handbooks

Managements’ Guide to Lawyers

Videos

From GLE Members

Awards

The Best Of The Best

News

Articles and Updates

Testimonials

From GLE Members

Contact

Get In Touch

Markus Koerner

AI and Copyright: the Line Between Training and Theft

Latest Litigation Snapshot: Elsevier & Others v. Meta

What the Complaint Alleges, Key Legal Theories

US Position: Fair Use, Ongoing Cases and What Courts Are Testing

EU Framework: EU AI Act, Copyright Law and TDM Exceptions

Germany-Specific Points: Implementation and Likely Interpretation

Comparing US and EU Legal Tests: The AI Copyright Line Between Training and Theft

Practical Exposure and Risk for Companies Building or Deploying AI

Real-World Examples and Hypotheticals

Compliance Playbook: Sourcing, Documenting and Licensing Training Data

Monitoring, Detection and Response

What to Do Now, Checklist and Next Steps for German Businesses

Conclusion

Need Legal Advice?

Sources

FAQs

The premier guide to leading legal professionals throughout the world

About Us

Global Law Experts App

Social Posts

See More:

Quick Links

Contact Us

Stay Informed

Quick Links

Help

Continents