AI News

Computer Coders Appeal $9 Billion Copyright Lawsuit Against OpenAI and Microsoft

In a pivotal moment for the generative AI industry, a group of computer programmers has urged the U.S. Court of Appeals for the Ninth Circuit to revive their class-action lawsuit against Microsoft, GitHub, and OpenAI. The appeal, argued on February 11, 2026, seeks to reinstate claims under the Digital Millennium Copyright Act (DMCA) that could expose the tech giants to estimated statutory damages exceeding $9 billion.

The case, Doe v. GitHub, Inc., centers on GitHub Copilot, an AI-powered coding assistant trained on billions of lines of public code. At the heart of the dispute is whether AI companies violate copyright law when they strip "Copyright Management Information" (CMI)—such as author names and license headers—during the training process, even if the AI's output is not an identical copy of the original work.

The $9 Billion Stakes: Reviving the DMCA Claims

The plaintiffs, a group of anonymous software developers, argue that the lower court erred in dismissing their DMCA Section 1202 claims. Section 1202 prohibits the intentional removal or alteration of CMI with the intent to conceal infringement.

In mid-2024, U.S. District Judge Jon S. Tigar dismissed these specific claims, establishing a controversial "identicality" requirement. Judge Tigar ruled that for a Section 1202 violation to occur, the plaintiffs must show that the AI generated an exact copy of their code with the CMI missing. Because AI models like Copilot typically synthesize new code rather than regurgitating exact blocks, the District Court found no violation.

On appeal, the plaintiffs contended that this interpretation effectively nullifies the DMCA in the age of artificial intelligence. Their legal team argued before the 9th Circuit that the statute was designed to protect the integrity of copyright attribution, regardless of whether the subsequent distribution is a verbatim copy or a derivative work.

If the 9th Circuit reverses the lower court's decision, the financial implications are staggering. The DMCA allows for statutory damages of $2,500 to $25,000 per violation. With Copilot having millions of users and generating countless lines of code daily, the plaintiffs estimate potential liability could scale up to $9 billion, a figure that would fundamentally alter the economics of AI development.

The "Identicality" Battleground

The oral arguments highlighted a sharp divide in how copyright law should apply to machine learning. The defendants—Microsoft and OpenAI—maintain that the lower court's ruling is consistent with the purpose of the DMCA. They argue that without a requirement for identicality, any output that arguably "resembles" training data but lacks attribution could trigger liability, chilling innovation and subjecting AI tools to limitless lawsuits.

The table below outlines the core legal arguments presented by both sides regarding the interpretation of DMCA Section 1202.

Legal Arguments on DMCA Section 1202
---|---|----
Argument Aspect|Plaintiffs' Position (Coders)|Defendants' Position (Microsoft/OpenAI)
Statutory Interpretation|Section 1202 protects the integrity of CMI on the original work. Removing it during "ingestion" violates the law regardless of the output.|Liability only attaches if CMI is removed from an identical copy of the work that is then distributed.
The "Identicality" Test|The District Court invented an "identicality" requirement that does not exist in the statute's text.|Requiring identicality prevents overreach; otherwise, fair use and transformative works would be stifled.
Harm Definition|Harm occurs when attribution is stripped, severing the link between the creator and their work, facilitating future infringement.|No harm is proven unless the specific plaintiff's code is reproduced exactly without their CMI.
Industry Impact|Allowing CMI removal incentivizes "laundering" open-source code to bypass license terms (e.g., GPL, MIT).|Imposing strict CMI liability on AI training would make generative AI impossible to develop legally.

Understanding CMI in the Age of AI

To understand the gravity of this appeal, one must look at how open-source software functions. Open-source licenses, such as the MIT License or the GNU General Public License (GPL), allow for the free use of code on the condition that the original author is credited and the license terms are preserved. This attribution data—the CMI—is crucial for the ecosystem's compliance and trust.

When OpenAI's Codex model (which powers Copilot) ingests this code, it tokenizes the text, effectively breaking it down into statistical relationships. In this process, the specific license headers and author comments are often treated as just another pattern to be learned or ignored, rather than legally binding metadata to be preserved.

The plaintiffs argue that this process creates a tool that allows users to unwittingly infringe on copyrights by using code without the required attribution. They assert that Microsoft and OpenAI are not merely "reading" the code but actively stripping the mechanisms designed to protect it.

Implications for the Broader AI Industry

A ruling in favor of the plaintiffs by the 9th Circuit would send shockwaves through the AI sector. It would likely force companies to:

  1. Retrain Models: AI developers might need to scrub their training datasets of any code or text where CMI cannot be perfectly preserved in the output.
  2. Implement Attribution Mechanisms: Future AI models might be required to "cite their sources," a technical challenge that is currently unsolved for large language models (LLMs).
  3. Face Retroactive Liability: Other generative AI models, including text generators like ChatGPT and image generators like Midjourney, could face similar lawsuits if they are found to have stripped CMI from training data.

Legal experts suggest that the 9th Circuit's decision could set the standard for how all "ingestion" of copyrighted data is treated under US law. While the defendants rely heavily on the "fair use" doctrine for the use of the content, the DMCA claims sidestep fair use by focusing on the removal of metadata, which is a separate statutory violation.

What Comes Next?

The 9th Circuit panel is expected to issue its ruling later this year. Given the novelty of the legal questions—applying a 1998 statute to 2026 technology—the decision will likely be appealed to the Supreme Court regardless of the outcome.

For now, the developer community watches closely. The case represents more than just a financial dispute; it is a fundamental disagreement about the value of human authorship in an increasingly automated world. If the coders succeed, it could affirm that the rules of open source cannot be rewritten by algorithms. If they fail, it may cement the current industry practice where data is fuel, and attribution is optional.

Featured