
Center for Investigative Reporting vs. OpenAI: The Publisher Lawsuits Keep Coming

Table of Contents
Center for Investigative Reporting vs. OpenAI: The Publisher Lawsuits Keep Coming #
This week the Center for Investigative Reporting joins the growing list of publishers taking OpenAI to court. Here's what the lawsuit alleges and why it matters for the future of AI training data.
Table of Contents #
- The Lawsuit Filed Today — CIR vs. OpenAI and Microsoft: the basics
- Who Is the Center for Investigative Reporting — The nonprofit newsroom behind Mother Jones and Reveal
- What CIR Is Claiming — Copyright infringement allegations and DMCA violations
- The Pattern of Publisher Lawsuits — NYT, Tribune, Denver Post: a comparison
- OpenAI's Position — Fair use defense and licensing deals
- The Investigative Journalism Angle — Why nonprofit news matters in this fight
- Legal Landscape Update — Where things stand in June 2024
- What This Means for AI — Broader implications for training data
The Lawsuit Filed Today #
Center for Investigative Reporting sued OpenAI and Microsoft for copyright infringement, alleging the companies trained ChatGPT and Copilot on CIR's journalism without permission or compensation.
The Center for Investigative Reporting filed its complaint in the U.S. District Court for the Southern District of New York this week, naming both OpenAI, Inc. and Microsoft Corporation as defendants. The lawsuit represents the latest escalation in the ongoing conflict between news publishers and AI companies over the use of copyrighted journalism to train large language models.
According to the complaint, OpenAI and Microsoft "copied, used, abridged, and displayed CIR's valuable content without CIR's permission or authorization, and without any compensation to CIR." The lawsuit specifically targets how ChatGPT and Microsoft's Copilot were trained on content from Mother Jones and Reveal, CIR's flagship publications.
Monika Bauerlein, CEO of the Center for Investigative Reporting, did not mince words in her statement announcing the lawsuit: "OpenAI and Microsoft started vacuuming up our stories to make their product more powerful, but they never asked for permission or offered compensation, unlike other organizations that license our material. This free rider behavior is not only unfair, it is a violation of copyright."
The timing is notable. This filing comes six months after The New York Times launched its own lawsuit against the same defendants in the same jurisdiction, and just two months after eight Alden Global Capital newspapers joined the fray. The Southern District of New York is quickly becoming the central battleground for determining whether AI companies' scraping and training practices constitute fair use or mass copyright infringement.
| Lawsuit Filing Details | |
|---|---|
| Plaintiff | Center for Investigative Reporting, Inc. |
| Defendants | OpenAI, Inc.; Microsoft Corporation |
| Court | U.S. District Court, Southern District of New York |
| Case Number | 1:24-cv-04872 |
| Filed | This week (June 2024) |
| Legal Claims | Copyright infringement, DMCA violations |
| Publications | Mother Jones, Reveal |
| CEO Statement | "Free rider behavior is not only unfair, it is a violation of copyright" |
Who Is the Center for Investigative Reporting #
CIR is the oldest nonprofit investigative newsroom in the United States, producing Mother Jones magazine and the Reveal radio show and podcast.
Founded in 1977, the Center for Investigative Reporting has spent nearly five decades establishing itself as the premier nonprofit investigative journalism organization in America. Unlike commercial news outlets driven by advertising and shareholder returns, CIR operates on a mission-driven model: exposing corruption, holding power accountable, and producing journalism that serves the public interest regardless of commercial viability.
Mother Jones, CIR's flagship print and digital publication, reaches millions monthly with deep-dive investigations into politics, climate, criminal justice, and corporate malfeasance. Reveal, the organization's radio show and podcast, syndicates across hundreds of public radio stations nationwide and has won multiple Peabody Awards for its investigative work. Together, these platforms represent some of the most rigorous, fact-intensive journalism being produced in America today.
What makes CIR's content particularly valuable for AI training is precisely what makes it expensive to produce. Investigative journalism requires months or years of research, FOIA requests, source cultivation, fact-checking, and legal review. A single Mother Jones investigation might cost hundreds of thousands of dollars to produce. This is exactly the kind of high-quality, authoritative content that makes language models more capable, more factual, and more credible-sounding.
CIR has historically licensed its content to organizations that want to use it legitimately. Other companies have paid for access to their journalism. OpenAI and Microsoft, according to CIR, simply took it without asking.
| CIR at a Glance | |
|---|---|
| Founded | 1977 |
| Type | 501(c)(3) nonprofit newsroom |
| Flagship Publications | Mother Jones (print/digital), Reveal (radio/podcast) |
| Awards | Multiple Peabody Awards, Pulitzer finalists |
| Distribution | Millions monthly (digital), 600+ radio stations |
| Content Type | Investigative journalism, long-form reporting |
| Mission | Expose corruption, hold power accountable |
| Prior Licensing | Yes — other organizations license CIR content legitimately |
What CIR Is Claiming #
CIR alleges OpenAI and Microsoft copied, used, abridged, and displayed their content without authorization, violating copyright law and the Digital Millennium Copyright Act.
The complaint lays out a straightforward but devastating argument: OpenAI and Microsoft used CIR's copyrighted journalism to train their AI models, and those models now reproduce CIR's work, summarize it, and compete with it, all without permission, attribution, or compensation.
CIR's lawyers argue that the defendants "trained ChatGPT not to acknowledge or respect copyright. And they did this all without permission." The lawsuit documents instances where ChatGPT has reproduced CIR content verbatim or generated closely derivative summaries that maintain the original expressive elements without proper attribution.
The "free rider" argument is central to CIR's case. Monika Bauerlein's statement makes this explicit: "The work of journalists, at CIR and everywhere, is valuable, and OpenAI and Microsoft know it." The companies, CIR argues, built billion-dollar products on the backs of journalists' labor while contributing nothing back to sustain that journalism.
The lawsuit brings two major legal claims:
- Copyright Infringement: Direct and vicarious infringement for copying and using CIR's works in training data and model outputs
- DMCA Violations: Removing or altering copyright management information, a separate claim with its own statutory damages
CIR is seeking either actual damages plus defendants' profits, or statutory damages of at least $750 per infringed work and $2,500 per DMCA violation. Given the volume of CIR content likely ingested during training, these damages could reach significant figures.
| Core Allegations | Details |
|---|---|
| Copyright Violations | Copying, using, abridging CIR content without permission |
| DMCA Claims | Removal/alteration of copyright management information |
| Training Data Use | CIR journalism used to train ChatGPT and Copilot |
| Output Infringement | Models reproduce CIR content verbatim or closely derivative |
| Competition Harm | AI outputs substitute for original CIR content, undermining revenue |
| "Free Rider" Theory | Billion-dollar products built on uncompensated journalist labor |
| Statutory Damages | $750+ per infringed work, $2,500 per DMCA violation |
| Requested Remedy | Damages, injunction against continued use of CIR works |
The Pattern of Publisher Lawsuits #
CIR joins at least eight other major publishers and news organizations already suing OpenAI over training data copyright violations.
The CIR lawsuit is not an isolated case. It represents a coordinated industry response to what publishers view as systematic intellectual property theft by AI companies. By June 2024, a clear pattern has emerged: major news organizations are done waiting for voluntary agreements and are taking OpenAI and Microsoft to court.
The New York Times fired the opening salvo in December 2023 with a lawsuit that has already cost the paper over $1 million in legal fees. The Times complaint was particularly damaging — it included over 100 pages of exhibits showing ChatGPT reproducing Times articles almost verbatim. The Times is seeking billions in damages and demanding the destruction of any GPT models trained on its content.
In February 2024, The Intercept, Raw Story, and AlterNet filed a joint lawsuit focusing specifically on ChatGPT's tendency to reproduce their journalism without proper attribution. These progressive outlets argued that the lack of attribution undermined their relationships with readers and partners.
April 2024 saw two major filings: eight newspapers owned by Alden Global Capital (including the New York Daily News, Chicago Tribune, and Orlando Sentinel) sued as a group, and The Denver Post filed separately. Both cases made similar arguments about uncompensated use of their content in training data.
The pattern is undeniable. Publishers across the political spectrum and business model spectrum — from nonprofit investigative outfits to legacy daily newspapers — are concluding that OpenAI's scraping practices crossed legal lines.
| Publisher Lawsuit Tracker | Filed | Key Claims | Status |
|---|---|---|---|
| The New York Times | Dec 2023 | Billions in damages, model destruction demand | Active, discovery phase |
| The Intercept/Raw Story/AlterNet | Feb 2024 | Attribution violations, reader relationship harm | Active |
| Alden Newspapers (8 papers) | Apr 2024 | Mass copyright infringement on millions of articles | Active |
| The Denver Post | Apr 2024 | Unlicensed training data use | Active |
| Center for Investigative Reporting | Jun 2024 | DMCA violations, free rider conduct | Just filed |
| Authors (Silverman et al.) | 2023 | Book content used without permission | Partially dismissed |
The geographic clustering is also notable. Most cases are filed in the Southern District of New York, home to both the Times' headquarters and major publishing industry presence. This concentration may accelerate precedent-setting rulings that will shape the entire AI training data landscape.
OpenAI's Position #
OpenAI maintains that its use of publicly available data for training is protected under fair use doctrine, while simultaneously pursuing licensing deals with willing publishers.
OpenAI has not remained silent as the lawsuits pile up. The company's public position rests on two seemingly contradictory pillars: a strong fair use defense in court, and an aggressive licensing strategy in the boardroom.
On the legal front, OpenAI's defense centers on the argument that training AI models on publicly available internet data constitutes fair use under copyright law. The company has stated it "build[s] our AI models using publicly available data, in a manner grounded in fair use, and supportive of innovation." This position treats web scraping for training data as analogous to how humans learn from reading publicly available material.
In response to CIR's specific lawsuit, an OpenAI spokesperson offered a softer line, emphasizing partnership: "We are working collaboratively with the news industry and partnering with global news publishers to display their content in our products like ChatGPT, including summaries, quotes, and attribution, to drive traffic back to the original articles."
This statement highlights the tension in OpenAI's strategy. While fighting lawsuits from publishers who refuse their terms, OpenAI is simultaneously signing lucrative licensing agreements with others. As of June 2024, OpenAI has announced deals with:
- The Associated Press (July 2023)
- Axel Springer (December 2023)
- Financial Times (April 2024)
- Dotdash Meredith (May 2024)
- News Corp (May 2024)
- Vox Media and The Atlantic (May 2024)
- Time (June 2024)
These deals reportedly run into the tens of millions of dollars annually. The strategy appears to be: sign willing partners, fight holdouts in court, and hope that favorable precedent or exhausting litigation costs bring the remaining publishers to the table.
Microsoft, co-defendant in most publisher lawsuits, has been quieter publicly but is deeply implicated as both an investor in OpenAI and the provider of Azure infrastructure for model training.
| OpenAI's Two-Track Strategy | Approach | Outcome |
|---|---|---|
| Fair Use Defense | Training on public data is transformative use | Being tested in court |
| Licensing Deals | Pay publishers for content access and attribution | 7+ major deals signed |
| Partnerships | Integrate publisher content with attribution | Product feature differentiation |
| Litigation Response | Fight claims, seek dismissal, prompt hacking defense | Cases proceeding to discovery |
The question is whether this dual strategy holds. If courts reject the fair use argument, OpenAI's licensing costs could balloon, potentially changing the unit economics of foundation model training for the entire industry.
The Investigative Journalism Angle #
This lawsuit matters particularly because investigative journalism represents some of the most expensive, labor-intensive, and valuable content AI companies have been training on.
Not all news is created equal, and the CIR lawsuit puts a spotlight on why that distinction matters for AI training ethics and copyright law. Investigative journalism is fundamentally different from commodity news aggregation — and AI companies have been benefiting from that difference without bearing any of the costs.
A typical breaking news story might take hours to report and write. An investigative feature from Mother Jones or Reveal often takes months or years. The process involves deep document review, public records requests, source development, travel, legal vetting, and fact-checking that can cost hundreds of thousands of dollars per story. This is the kind of content that makes AI models appear authoritative and well-informed.
The nonprofit dimension adds a moral layer to the legal arguments. CIR isn't a commercial entity trying to maximize shareholder returns. It's a public-interest organization operating on donations, grants, and the belief that accountability journalism serves democracy. When OpenAI and Microsoft use CIR's work to make their products more valuable without contributing back, they're not just harming a business — they're undermining a nonprofit mission.
Monika Bauerlein's statement drives this home: "The work of journalists, at CIR and everywhere, is valuable." The "everywhere" matters. If nonprofit investigative outlets cannot sustain themselves because AI companies can freely appropriate their output, the entire ecosystem of accountability journalism becomes threatened.
There's also a quality question. Investigative journalism represents some of the most fact-dense, rigorously vetted content on the internet. When ChatGPT sounds credible discussing complex policy issues, it's partly because it was trained on the exacting work of investigative reporters. The AI benefits from the credibility CIR has built over decades, then uses it to generate outputs that may compete with CIR's own reach and impact.
| Investigative vs. Commodity News | Investigative (CIR) | Commodity News |
|---|---|---|
| Production Time | Months to years | Hours to days |
| Cost per Story | $50K–$500K+ | Hundreds to thousands |
| Business Model | Nonprofit/donations | Advertising/subscriptions |
| Content Density | Deep, fact-intensive | Surface, rapidly produced |
| Training Value to AI | High — authoritative, factual | Variable |
| Harm from Appropriation | Threatens mission sustainability | Revenue impact |
| Attribution Complexity | Complex multi-source synthesis | Often single-source |
The CIR lawsuit isn't just about copyright. It's about whether the most resource-intensive form of journalism can survive in a world where AI companies can freely consume its output without contributing to its production.
Legal Landscape Update #
By June 2024, multiple parallel lawsuits are moving through federal courts, with early procedural rulings allowing cases to proceed toward discovery.
The legal landscape around AI training data is rapidly evolving. As CIR files its lawsuit this week, several parallel cases are already advancing through the system, giving us early signals about how courts might treat these novel copyright questions.
The New York Times case, now six months old, has moved past initial pleadings and into the discovery phase. This is significant — discovery means document requests, depositions, and the potential exposure of OpenAI's internal communications about training data sourcing. The Times has already spent over $1 million on this litigation, signaling serious commitment to seeing the case through trial if necessary.
A federal judge allowed the Alden newspaper group lawsuit to proceed in March 2024, rejecting early dismissal arguments from OpenAI. This procedural victory for publishers suggests courts are taking the fair use questions seriously rather than dismissing them at the threshold.
The Sarah Silverman-led author lawsuit was partially dismissed in February 2024, with the court finding that the plaintiffs hadn't adequately shown their works were actually used in training. However, other claims survived, and the ruling turned on specific evidentiary failures rather than broad legal principles.
Discovery in these cases will likely reveal what has so far remained secret: exactly what content was in OpenAI's training datasets. OpenAI has become increasingly opaque about training data sources as newer models have launched. Internal documents, emails between executives about content licensing, and technical specifications of training corpora could all become public through litigation.
The timeline for resolution is measured in years, not months. Copyright cases involving novel technology typically take 2–4 years to reach final judgment, with appeals adding additional time. For now, the industry operates in legal limbo: training continues, lawsuits multiply, and the ultimate boundaries of fair use for AI remain undefined.
| Case Status as of June 2024 | Filed | Current Stage | Key Development |
|---|---|---|---|
| NYT v. OpenAI/Microsoft | Dec 2023 | Discovery | $1M+ spent, exhibits showing verbatim reproduction |
| Alden Newspapers | Apr 2024 | Active | Judge denied early dismissal motion |
| Intercept/Raw Story/AlterNet | Feb 2024 | Active | Attribution-focused claims |
| Denver Post | Apr 2024 | Active | Similar claims to Alden group |
| CIR v. OpenAI/Microsoft | Jun 2024 | Just filed | DMCA claims add new dimension |
| Authors (Silverman) | 2023 | Partially dismissed | Some claims survived dismissal |
What happens in the Southern District of New York over the next 18–24 months will likely determine the legal framework for AI training data across the entire United States.
What This Means for AI #
The outcome of these lawsuits could reshape how AI companies source training data, potentially forcing licensing for all copyrighted content or establishing clearer fair use boundaries.
The CIR lawsuit, taken together with the wave of publisher litigation already in motion, represents an existential question for the AI industry. The outcome will determine whether the current model of broad web scraping for training data remains viable, or whether the industry must shift to a licensing-based approach that fundamentally changes cost structures and competitive dynamics.
There are three broad scenarios that could emerge from these cases:
Scenario 1: Fair Use Prevails
If courts accept OpenAI's fair use argument, the status quo largely continues. AI companies can train on publicly available web content without licensing. This would be the most permissive outcome for AI development, preserving the ability of startups to compete without massive content licensing budgets. Publishers would need to rely on technical measures (robots.txt, paywalls) rather than legal ones.
Scenario 2: Licensing Required
If courts reject fair use and require licensing for training data, the industry bifurcates. Well-funded incumbents (OpenAI, Google, Microsoft, Anthropic) pay substantial licensing fees to major publishers, entrenching their advantage. Startups and open-source projects face higher barriers to entry. Model capabilities might improve on factual accuracy (licensed content is higher quality) but at dramatically higher training costs.
Scenario 3: Hybrid Framework
Courts might establish a nuanced framework where some uses are fair use and others require licensing. Factors could include: whether the content was paywalled, whether the AI output competes with the original, whether attribution is provided, and the nature of the content (news vs. creative works). This creates legal complexity but potentially balances innovation incentives with creator rights.
For builders and AI users, the immediate implications are limited — cases will take years to resolve. But the long-term architecture of AI training is being determined in courtrooms right now. Companies building products on top of language models should consider: what happens if training data legality changes? Are there advantages to models trained on licensed content? How might attribution requirements evolve?
| Scenario | Probability | Impact on AI Industry | Impact on Publishers |
|---|---|---|---|
| Fair Use Prevails | Moderate | Status quo continues | Publishers lose leverage, rely on technical protections |
| Licensing Required | Moderate | Higher costs, incumbents advantaged | Revenue stream, but deals vary by publisher size |
| Hybrid Framework | Significant | Complex compliance, legal uncertainty | Case-by-case enforcement, continued litigation |
The CIR lawsuit adds weight to the publisher coalition. Every additional credible plaintiff increases the pressure on courts to take these claims seriously — and on OpenAI to settle.
Frequently Asked Questions #
Why is the Center for Investigative Reporting suing OpenAI? #
CIR alleges OpenAI and Microsoft copied, used, and displayed their copyrighted journalism without permission or compensation to train ChatGPT and Copilot. The lawsuit claims this "free rider behavior" violates copyright law and the Digital Millennium Copyright Act. CIR notes that other organizations license their content legitimately, but OpenAI and Microsoft simply took it without asking.
How is this different from the New York Times lawsuit? #
Both cases share the same defendants and core legal theories, but CIR adds DMCA claims and represents nonprofit investigative journalism specifically. The Times case is further along procedurally (in discovery) and seeks billions in damages plus model destruction. CIR's complaint emphasizes the unique value of investigative reporting and the nonprofit dimension of the harm.
What damages is CIR seeking? #
CIR is seeking either actual damages plus defendants' profits, or statutory damages of at least $750 per infringed work and $2,500 per DMCA violation. Given the volume of content likely used in training, these damages could reach substantial figures. The complaint does not specify an exact total amount sought.
What is OpenAI's response to the lawsuit? #
OpenAI maintains that training on publicly available data constitutes fair use, while emphasizing their "collaborative" approach with news publishers through licensing deals. A spokesperson stated they are "partnering with global news publishers to display their content in our products like ChatGPT, including summaries, quotes, and attribution, to drive traffic back to the original articles."
How many publishers have sued OpenAI so far? #
At least five major lawsuits representing over a dozen publishers are currently active against OpenAI and Microsoft. This includes The New York Times (December 2023), The Intercept/Raw Story/AlterNet (February 2024), eight Alden Global Capital newspapers including the New York Daily News and Chicago Tribune (April 2024), The Denver Post (April 2024), and now the Center for Investigative Reporting (June 2024).
What is the fair use argument in AI training? #
OpenAI argues that training on publicly available web data is "transformative" and therefore fair use under copyright law. The defense treats AI training as analogous to human learning from reading — a new purpose that doesn't substitute for the original work. Publishers counter that AI outputs often do substitute for original content and that massive-scale copying for commercial purposes exceeds fair use boundaries.
Could these lawsuits force OpenAI to pay for all training data? #
If courts reject the fair use defense, OpenAI could be required to license copyrighted content or remove it from training data. This would dramatically increase training costs and potentially advantage well-funded incumbents who can afford licensing deals over smaller competitors. The outcome depends on how courts interpret fair use in the context of generative AI training.
What happens if OpenAI loses these cases? #
A loss could force OpenAI to pay substantial damages, enter into licensing agreements with publishers, or potentially even destroy models trained on infringing content. The Times lawsuit explicitly requests destruction of GPT models incorporating its articles. A broad loss establishing that training requires licensing would reshape the entire AI industry's cost structure and competitive dynamics.
How does this affect ChatGPT users? #
Direct impact on users is minimal for now, but long-term outcomes could affect model capabilities, pricing, or attribution features. If OpenAI must pay licensing fees, subscription prices could rise. If models are retrained on licensed-only data, they might become more factually accurate but potentially less broad in coverage. Enhanced attribution features may be added to demonstrate publisher partnership value.
What is the Digital Millennium Copyright Act (DMCA) and how does it apply? #
The DMCA prohibits removing or altering copyright management information (CMI) like author attribution and copyright notices. CIR's complaint alleges that OpenAI's training process stripped CMI from their works and that ChatGPT outputs reproduce content without proper attribution. DMCA violations carry statutory damages of $2,500 per violation, independent of copyright infringement claims.
Are any publishers making licensing deals with OpenAI instead? #
Yes — OpenAI has signed content licensing deals with at least seven major publishers including The Associated Press, Axel Springer, Financial Times, Dotdash Meredith, News Corp, Vox Media, The Atlantic, and Time. These deals reportedly pay tens of millions annually and often include attribution features in ChatGPT. This two-track strategy — litigating with holdouts while partnering with willing publishers — is central to OpenAI's approach.
What does this mean for investigative journalism specifically? #
The lawsuit highlights a threat to the most expensive, labor-intensive form of journalism. Investigative reporting costs hundreds of thousands of dollars per story and relies on nonprofit funding models. If AI companies can freely appropriate this work without contributing back, the sustainability of accountability journalism is threatened. CIR's nonprofit status adds moral weight to the legal claims.
Related Reading #
- OpenAI Sora: The Video Generation Revolution Arrives — How OpenAI's video model is reshaping content creation
- Claude 3 Opus: Anthropic's Answer to GPT-4 — The foundation model landscape beyond OpenAI
- The AI Agency Stack — How I'm building AI-powered workflows for content and automation
Navigate the AI Legal Landscape with Confidence #
The CIR lawsuit is just one front in a multi-dimensional transformation. While courts decide whether AI training constitutes fair use, the practical question for businesses is: how do you leverage AI today while building systems that won't break tomorrow?
I help companies navigate this landscape through two service tracks:
AI Automation + Growth Engineering
I build custom AI agents, n8n workflows, and growth pipelines that automate the repetitive work so your team can focus on what matters. Whether it's programmatic content systems, lead enrichment pipelines, or AI-powered customer support — I ship production-grade automation in weeks, not quarters.
Custom Web Design + Digital Experiences
I design and build 5-figure immersive websites that combine cinematic scroll experiences with bulletproof technical architecture. From brand storytelling to conversion optimization, I create digital experiences that justify premium pricing and leave lasting impressions.
The publishers suing OpenAI are defending their business models. The question is: what's your AI strategy?
Book a 15-minute strategy call and let's talk about what AI automation or a flagship digital experience could do for your business.
Related Posts

Google I/O 2026 Action List: How I Prompted Gemini 3.5 Flash and Antigravity Workflows
Google I/O 2026 just reset the AI tooling landscape. Here's the 9-action checklist for builders who want to ship this week, not just watch the keynote.

Anthropic vs. OpenAI vs. Google: The State of the Frontier in May 2026
A head-to-head breakdown of the three AI giants in May 2026: Claude Opus 4.6, GPT-5.3 and 5.4, Gemini 3.1 Pro. Real specs, real pricing, and what actually matters for builders.

Kimi K2 Open Weights: How I Prompted Moonshot's Frontier Model for Agentic Tool Use
How I direct Kimi K2 by Moonshot AI for agentic workflows, long-context tool calling, and workflow automation. A 1 trillion parameter MoE model with competitive benchmarks at 5-17x lower cost than GPT-5 and Claude.




