Our Expert in Germany
No results available
Last updated: 25 May 2026
On 5 May 2026, a coalition of major academic and trade publishers, led by Elsevier, filed a landmark copyright complaint against Meta Platforms in the Southern District of New York, alleging that the tech giant systematically scraped and ingested millions of copyrighted works to train its large language models. The case, docketed as Elsevier Inc. v. Meta Platforms, Inc. , No. 1:26-cv-03689 (S. D. N. Y. ), sharpens a question that has been building across courtrooms and legislatures worldwide: where exactly does the line between AI copyright training and outright theft fall?
For German businesses developing, deploying or procuring AI systems, the answer depends on a complex interplay between US fair use doctrine, the EU AI Act’s transparency obligations and the text and data mining (TDM) exceptions embedded in Germany’s own Urheberrechtsgesetz (UrhG). This article analyses the latest litigation, maps the diverging legal tests on both sides of the Atlantic and provides a practical compliance playbook for companies that need to act now.
The complaint filed on 5 May 2026 represents one of the largest coordinated actions by the publishing industry against an AI developer. According to the filing published by the Association of American Publishers, a group of publishers including Elsevier, Wiley and other prominent rights-holders allege that Meta copied vast quantities of copyrighted scientific articles, textbooks and literary works, without licence or authorisation, to build and refine its family of generative AI models.
Press coverage from Nature and The Guardian has underscored the scale of the allegations: the complaint describes an industrial data-ingestion pipeline that, plaintiffs contend, treated copyrighted catalogues as freely available raw material. The docket record on CourtListener confirms the filing date and the Southern District of New York as the venue, placing the case squarely in one of the busiest federal courts for intellectual property disputes.
The significance for German and European businesses is immediate. Any organisation that licenses Meta’s models, fine-tunes them on proprietary corpora, or incorporates their outputs into commercial products may inherit downstream exposure if courts ultimately find that the underlying training constituted AI training copyright infringement.
The publishers’ complaint advances several interlocking theories of liability. First, it asserts direct copyright infringement through the mass reproduction of protected works during the ingestion and pre-processing stages of model training. Second, it alleges that Meta’s conduct amounts to the creation of unlicensed derivative works, a particularly significant claim because it targets the model weights themselves, not just the training process. Third, the complaint contends that Meta’s actions are wilful, pointing to internal awareness of copyright restrictions and the deliberate circumvention of access controls. Finally, the publishers argue that Meta’s AI products compete with and displace the established licensing markets for scientific and educational content, causing direct market harm.
The legal defence that underpins virtually every AI-training dispute in the United States is the fair use doctrine codified in 17 U.S.C. § 107. Courts apply a four-factor balancing test that considers: (1) the purpose and character of the use, including whether it is transformative; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the potential market for the original work.
Fair use AI training defences have been central to earlier cases brought by authors, visual artists and news organisations. Industry observers expect the Elsevier complaint to test fair use at a new scale, because the sheer volume of reproduced works and the commercial nature of Meta’s AI products may weaken the “transformative purpose” argument that prevailed in prior search-engine and data-analytics rulings. Courts have historically treated wholesale copying more sceptically than selective quotation, and the publishers are positioning their claims to exploit that distinction.
For multinational businesses operating from Germany, US litigation outcomes matter even when the infringing conduct occurs on American servers. Cross-border copyright enforcement mechanisms mean that injunctions issued by US courts can restrict global model distribution, and contractual indemnities in AI vendor agreements may not extend to acts found to be wilful infringement. The likely practical effect will be heightened due-diligence obligations for any entity in the AI supply chain, regardless of where it is incorporated.
Furthermore, the Elsevier case does not exist in isolation. It joins a growing roster of US training-data disputes, including actions brought by visual-content licensing companies, authors’ guilds and news publishers, each of which is probing a different facet of the fair use boundary. Taken together, these cases are building a body of judicial reasoning that will shape global norms for years to come.
The European approach to AI copyright sits on a different legal foundation from the US fair use model. Rather than an open-ended balancing test, the EU relies on a system of specific statutory exceptions, the most important of which, for AI training purposes, are the text and data mining provisions introduced by Directive (EU) 2019/790 on Copyright in the Digital Single Market (the DSM Directive).
Article 3 of the DSM Directive permits text and data mining of lawfully accessed works for the purposes of scientific research by research organisations and cultural heritage institutions. Article 4 creates a broader TDM exception available to any person or entity, but subject to a critical limitation: rights-holders may expressly reserve their rights and opt out of text and data mining by means of machine-readable notices. Where an opt-out has been communicated, the exception does not apply and a licence is required.
The EU AI Act, which entered into force in stages from 2024 and whose general-purpose AI model provisions became applicable in 2025, adds a further layer. The Act does not replace copyright law, but it imposes specific transparency obligations on providers of general-purpose AI (GPAI) models. Providers must draw up and make publicly available a sufficiently detailed summary of the content used for training, and they must put in place a policy to comply with EU copyright law, including the TDM opt-out mechanism. These obligations are designed to give rights-holders the information they need to monitor and enforce their rights.
The interaction between EU AI Act copyright provisions and the DSM Directive’s TDM exceptions creates a dual compliance requirement: AI developers must not only ensure they qualify for a TDM exception (or hold a licence), but also demonstrate that they have robust systems to detect and honour opt-out notices. Failure on either front creates independent grounds for liability.
Germany transposed the DSM Directive’s TDM provisions into its national law through amendments to the Urheberrechtsgesetz (UrhG). Sections 44b and 60d UrhG implement, respectively, the general TDM exception and the research-specific TDM exception. Section 44b UrhG permits reproductions of lawfully accessible works for the purposes of text and data mining, but, mirroring Article 4 of the DSM Directive, only where the rights-holder has not reserved their rights in a machine-readable format.
German courts have historically taken a strong view of authors’ moral and economic rights, and early indications suggest that judicial interpretation of the TDM opt-out mechanism will be rigorous. Industry observers expect German regulators and courts to require clear, contemporaneous documentation that an AI developer checked for opt-out notices before ingesting any corpus. For businesses operating under Germany copyright law and building or deploying AI, the practical message is unambiguous: reliance on TDM exceptions demands proactive, auditable record-keeping at every stage of the data pipeline.
The divergence between the US and EU frameworks means that conduct which may be defensible on one side of the Atlantic can create serious liability on the other. The following comparison table summarises the key differences that German businesses and their cross-border legal teams need to understand.
| Topic | US (Fair Use Approach) | EU / Germany (TDM & Copyright Approach) |
|---|---|---|
| Legal test for training on copyrighted works | Four-factor fair use analysis (purpose, nature, amount, market effect); courts assess whether the use is transformative. | Exception-based: TDM exceptions (Articles 3–4 DSM Directive; §§ 44b, 60d UrhG) allow reproductions for text and data mining subject to conditions; general copyright protection remains intact. |
| Common defences used by AI developers | Fair use; implied licence (rare); de minimis / technical reproduction defences. | Reliance on TDM exceptions; contractual licences; compliance with EU AI Act transparency obligations. |
| Opt-out mechanism | No statutory opt-out; rights-holders must litigate to enforce. | Machine-readable opt-out (Article 4 DSM Directive / § 44b(3) UrhG); once reserved, no TDM exception applies. |
| Remedies & likely outcomes | Injunctions, statutory damages up to US $150,000 per work (wilful), disgorgement, settlements, high uncertainty; case-by-case. | Injunctions, compensatory damages under national law (§§ 97–97a UrhG), criminal liability for commercial infringement; licensing and collective bargaining solutions more likely in practice. |
The table illustrates a fundamental structural difference: the US system places the burden on defendants to prove fair use after litigation has begun, while the EU system provides a defined safe harbour that companies can plan around in advance, but only if they actively comply with opt-out requirements and maintain proper documentation.
Understanding the theoretical legal tests is necessary, but what German businesses need most urgently is a clear assessment of practical exposure. The risks of training AI on copyrighted data without permission are substantial, multi-dimensional and not limited to direct model developers.
Consider a German ed-tech company that fine-tunes a large language model on a corpus of scientific journal articles obtained via an API. If the publisher has placed a machine-readable TDM opt-out on its platform, the company cannot rely on § 44b UrhG and has no statutory defence, the fine-tuning constitutes unlicensed reproduction. Alternatively, a design firm that uses a generative image model to produce marketing materials may discover that the underlying model was trained on copyrighted photographs. Even though the firm did not train the model, its commercial use of infringing outputs could expose it to claims under both US and German law.
Early indications suggest that cross-border copyright enforcement in cases involving AI-generated outputs will become an increasingly active area of litigation.
The most effective response to the current legal uncertainty is to build compliance into the data pipeline from the outset. The following seven-point checklist provides a practical framework for avoiding AI training copyright infringement and documenting lawful data use.
In addition, consider including the following sample clause in your AI vendor agreements:
A robust copyright risk audit for AI should be conducted at least annually, and immediately following any significant change to the training pipeline, the regulatory environment, or the case law landscape.
Compliance does not end once a model is trained. Ongoing monitoring is essential to identify potential infringement in model outputs and to respond swiftly when issues arise.
The filing of the Elsevier v. Meta complaint on 5 May 2026 should serve as an immediate catalyst for action. The following executive checklist summarises the priority steps for board-level decision-makers and in-house legal teams.
The line between AI copyright training and theft is not drawn by technology, it is drawn by law, and the law is moving fast on both sides of the Atlantic. German businesses that act now to map their data provenance, honour TDM opt-outs, secure proper licences and build auditable compliance processes will be best positioned to navigate the uncertainty ahead. Those that delay risk finding themselves on the wrong side of the line.
This article was produced by Global Law Experts. For specialist advice on this topic, contact Markus Koerner at Bird & Bird, a member of the Global Law Experts network.
posted 19 minutes ago
posted 47 minutes ago
posted 2 hours ago
posted 2 hours ago
posted 2 hours ago
posted 2 hours ago
posted 3 hours ago
posted 3 hours ago
posted 4 hours ago
posted 4 hours ago
posted 5 hours ago
posted 5 hours ago
No results available
Find the right Legal Expert for your business
Sign up for the latest legal briefings and news within Global Law Experts’ community, as well as a whole host of features, editorial and conference updates direct to your email inbox.
Naturally you can unsubscribe at any time.
Global Law Experts is dedicated to providing exceptional legal services to clients around the world. With a vast network of highly skilled and experienced lawyers, we are committed to delivering innovative and tailored solutions to meet the diverse needs of our clients in various jurisdictions.
Global Law Experts is dedicated to providing exceptional legal services to clients around the world. With a vast network of highly skilled and experienced lawyers, we are committed to delivering innovative and tailored solutions to meet the diverse needs of our clients in various jurisdictions.
Send welcome message