How OpenAI Has Misappropriated My Copyright: ChatGPT’s Land Grab

A screen shot of the July 13,2023 New York TImes article "F.T.C. Opens Investigation Into ChatGPT Maker Over Technology’s Potential Harms The agency sent OpenAI, which makes ChatGPT, a letter this week over consumer harms and the company’s security practices." Also shown is the top Readers Pick comment by Adrian Segar. "The content of at least one of my books on meeting design, copyright registered in 2010 with the United States Copyright Office, has been added to ChatGPT's database without my permission. It was probably scraped from one of the illegal pirate internet libraries of scanned books. Though I'm weakly flattered that ChatGPT has also incorporated every single post I've written on my meeting design blog (over 750 posts in the last 13 years—around half a million words), OpenAI's flagrant misappropriation of copyrighted works from pirate databases for their own financial gain is beyond the pale."

I am resigned to the fact that OpenAI‘s Large Language Model ChatGPT has scraped every blog post I’ve written here (over 750 posts in the last 13 years—around half a million words) so it can parrot my thoughts about meeting design, facilitation, and other topics. But I felt surprised, dismayed, and angry to discover that this $10 billion company, OpenAI, has misappropriated my copyright, by digesting my copyrighted book Conferences That Work: Creating Events That People Love without any notification, discussion, or thought of compensation.

ChatGPT can be a useful tool. But does its utility justify OpenAI blatantly misappropriating copyright materials for its own benefit?

I don’t think so.

ChatGPT, owned by OpenAI, has misappropriated my copyright

We have no idea just how many copyrighted works besides my book have been incorporated into ChatGPT. OpenAI has not released any information about the datasets it has used. However, attorneys Shawn Helms and Jason Krieser who specialize in information technology law, write that “The vast majority of the text used to train ChatGPT was certainly subject to copyright protection.” Jenna Burrell, Director of Research for Data & Society, adds:

“The bigger concern is how ChatGPT concentrates wealth for its owners off of copyrighted work. It’s not clear if the current state of copyright law is up to the challenge of tools like it, which treat the internet as a free source of training data. Among other challenges, ChatGPT is fundamentally opaque. It is essentially impossible to track down whose copyrighted material is being drawn from in the prose it produces, suggesting every result may comprise multiple violations.”
—Jenna Burrell, ChatGPT and Copyright: The Ultimate Appropriation

I’m not alone in my concerns. Sarah Silverman and some best-selling novelists with deeper pockets than me have sued OpenAI for “ingesting their books”.

The FTC investigates OpenAI

Last week, the FTC opened an investigation into OpenAI, over whether ChatGPT has harmed consumers through its collection of data and its publication of false information on individuals. Though it seems that the investigation focuses on harm to consumers rather than the wholesale misappropriation of copyrighted information, I’m glad that the U.S. government is at least aware of ChatGPT’s impact on society in general.

This brings us to my own personal stake in OpenAI’s land grab. You may be wondering how I know that ChatGPT has ingested a copy of my first book (and, for all I know, my other books as well). I’m not going to provide specific evidence here, though it’s along the lines of the AP News story linked above, and I’m confident that my evidence is persuasive. What I will provide, however, is already in the public domain, via a comment I made to the New York Times story about the FTC investigation into OpenAI [guest link].

I share my thoughts with the New York Times

In my comment, I shared how OpenAI misappropriated my copyright, provoking a number of comments and questions to which I responded.

Because the comment thread illuminates and expands on my thoughts, I have reproduced it in full below with my comments in red. I’ve also rearranged the comments so they are in thread order.

To see the thread on the New York Times website:

  • Open the above link;
  • Click on the comments button below the subhead; and
  • click Reader Picks, which will bring my comment to the top.

AJS commented July 13
USA
The content of at least one of my books on meeting design, copyright registered in 2010 with the United States Copyright Office, has been added to ChatGPT’s database without my permission. It was probably scraped from one of the illegal pirate internet libraries of scanned books.

Though I’m weakly flattered that ChatGPT has also incorporated every single post I’ve written on my meeting design blog (over 750 posts in the last 13 years—around half a million words), OpenAI’s flagrant misappropriation of copyrighted works from pirate databases for their own financial gain is beyond the pale.

191 Recommend 16 REPLIES

Robert commented July 13
St Paul
@AJS That’s an interesting argument, but how is ChatGPT’s use of that information substantively different than what data aggregators, including behemoths like Google, have been doing for years?

ChatGPT is just a shell overlayed onto a data set. It processes searches and responses in a natural language format, but that’s more of a superficial than substantive difference.

Are you opposed to all services that have scraped, categorized, and made your writings available, or is there something different about ChatGPT that you’re opposed to?

10 Recommend

AJS commented July 13
USA
@Robert, unlike my blog posts which are freely available for anyone with an internet connection to read, I have never made my copyrighted book available for free public reading on the internet. People have to pay to buy a legal copy.
Do you really think it’s perfectly OK for ChatGPT to illegally add a pirated scanned copy of my book to their database?

42 Recommend

SteveRR commented July 13
CA
@AJS

Copyright refers to “copying” – so the first question is “did ChatGPT copy your work?”
It is more than likely that it did copy your work.

Second, Is ChatGPT Output a Derivative Work?
Most would probably argue that it is not a derivative work

Lastly – the infamous fair use:
If ChatGPT copied your work and such copying was not for a commercial purpose and had no economic impact on the copyright owner then it is probably fair use.

Your lawyers may disagree and that is what courts are for.

3 Recommend

Austin commented July 13
Austin TX
@SteveRR Fair use is specifically for “purposes such as criticism, comment, news reporting, teaching, scholarship, or research”. ChatGPT does neither. However, if it only uses snippets of sentences it would be ok. If it uses entire sentences or more, it could be a violation of copyright. BTW, registration is not necessary under US copyright. Copyright is automatic until the author releases it or waives it.

4 Recommended

SteveRR commented July 13
CA
@Austin

Not even the vaguest of clues where you get your “snippets of sentences” precedent.

Maybe look at fair use on youtube by way of example.

1 Recommend

AJS commented July 13
USA
@SteveRR,

First, OpenAI is not creating LLMs that slurp up everything they can get their CPUs on for the good of mankind. Rather, they are hoping to make a bazillion bucks ASAP. So I think you can make a good case that their use of my copyrighted book is for “a commercial purpose”.

Second, if anyone can get their questions they have about meeting design answered by ChatGPT—which is coughing up a version of everything in my copyrighted books on the topic—why would anyone buy a copy of my books? Under those circumstances, I think you can conclude that OpenAI’s appropriation of the contents of my copyrighted book has an “economic impact” on me.

I am not a lawyer. And I am not going to spend the rest of my life suing the giant corporation that is OpenAI—I have better things to do. But it’s pretty clear that OpenAI’s plundering of copyrighted works for their own gain “because they can” is reprehensible.

1 Recommend

Jacob commented July 13
Henderson
@AJS how do you know it was added to the system, from one of those libraries? Because if your book was widely published, so much so, that it ended up in what you call an online pirate library, is it just as likely that they used book summary sites and online posts describing the contents of your book and not the pirate library, you suspect they used?

1 Recommend

AJS commented July 13
USA
@Jacob, good question. I tested ChatGPT by asking it to summarize the most boring chapter in the book—one which has never been reviewed or mentioned. Search engines do not find any reference to the chapter; it has not been mentioned or extracted in any online review or post.

ChatGPT gave such an accurate summary of the chapter, it’s clear that the platform database includes it in its entirety.

I’ll probably never know how OpenAI got its hands on my book’s contents unless someone with deep pockets sues OpenAI and uses discovery to find out what is included in ChatGPT’s database and where they scraped it from.

5 Recommend

Jlaw commented July 13
California
@AJS on the one hand I see your point, on the other hand I can’t help wonder who really cares about a self published book but the author? I mean, no disrespect, but unless something is being said that isn’t true, I don’t see how an old book is worth depriving humanity from the latest and greatest in technology. This genie broke the bottle.

Recommend

AJS commented July 13
USA
@Jlaw, I suspect the 3,000+ people who have purchased my self-published book cared. Are you seriously saying that a self-published book has no value except to its author?

Recommend

John G commented July 13
Boston
@AJS i agree with you. It seems like a lot of people try to thread the needle for ChatGPT. However, if I upload something copyrighted to YouTube, I get a DMCA take down. That’s because YouTube and I would be making money off of the copyrighted content. The fact that the copyrighted content is obscured the way it is in ChatGPT should make no difference. ChatGPT makes no effort to even reference or cite the source material.

It could even be argued that chatGPT is a derivative work when it provides snippets “in the style of” an author.

If I make a performance from a book, I have to acquire rights to do so. ChatGPT is a performance assembled from “samples” of other peoples work.
chatGPT is blatant intellectual property theft and should be shuttered with cease and desist orders until this is resolved. There are plenty of LLM efforts that have a much cleaner pedigree than chatGPT so we would not lose much in terms of technological advancement.

2 Recommended

Observer commented July 13
NYC
@AJS This is a fascinating case, but you are blurring lines between three concepts: (1) stealing one copy of your book, (2) copyright, and (3) attribution.

On piracy: OpenAI clearly owes you the $25 (or whatever it costs) for access to your book. But that doesn’t really seem to be what is bothering you.

On copyright: OpenAI could be violating your copyright whether or not they bought your book. If they bought it legally and then reprinted exact passages, that would be a copyright violation. But the way OpenAI answers questions is arguably no different than a person who has learned the material. If I buy one of your books and answer questions someone asks me about it, that doesn’t necessarily make me a copyright violator.

It is a brand new technology that poses problems that aren’t addressed by copyright law. And, personally, I sincerely hope they are *not* found to be violating copyright law because the potential value of their service is so great. Transformational, really, in areas like medicine.

On credit: OpenAI should arguably still credit you as the source for their information. And I am certain they are working on this.

But so far, it seems like you are out $25. A bit piratical, but not a flagrant misappropriation.

2 Recommend

AJS commented July 13
USA
@Observer, but OpenAI _didn’t_ buy a copy of my book and then incorporate it into their database. And they have no intention of doing so.

Your argument is equivalent to saying someone can steal thousands of books from a bookstore, and if they get caught they can just pay for the books and everything is fine. I’m not sure our society would work so well if that was how copyright worked.

1 Recommend

John G commented July 13
Boston
@Observer if chatGPT is like a person, then you could say it is answering questions like a human. If it is like a program, then it is answering from the raw data.

It is most decidedly not like a person.

The “person” here is openai the corporation, which has used a vast array of copyrighted work to create a commercial product which makes money off of that copyrighted work. This would be no different than a company of hundreds of employees buying one copy of a book, copying it to all employees to enable them to answer questions, which violates the author’s rights.

Recommend

JN commented July 13
NY
@AJS
For the sake of argument, are you ok if OpenAI actually paid for a copy of your book before using it as training data for ChatGPT in the pursuit of knowledge?

Recommend

AJS commented July 14
USA
@JN,
As pointed out in earlier comments, OpenAI purchasing one copy of my book…
1) …didn’t and isn’t going to happen, and
2)…doesn’t give OpenAI the right to use it in ways that violate my copyright (see the argument about fair use).
Just as movie studios don’t get the right to make a movie of a book if they buy a copy—they typically pay a few percent of production costs to the copyright owner.
Just as libraries don’t have the right to buy and scan one physical book and lend it to as many patrons as they like. Libraries also negotiate payments that are far more than the retail cost of an ebook for the right to lend it to multiple patrons.
OpenAI has ignored these and other existing compensation models for copyright holders and simply taken everything they wanted for their database without discussion or a shred of conscience.

Recommend


What should OpenAI do?

OpenAI has misappropriated my copyright. I’m not happy about this, and pessimistic that this huge tech-bro-driven corporation will be brought to heel for its immoral behavior. Some authors and artists have responded by deciding to remove their content from the internet. I think this is the wrong approach. I want large corporations like OpenAI to stop misappropriating copyrighted work. OpenAI has several ethical options. The company could:

  • Stop including copyrighted work in their database; or
  • Ask creators for permission to include their content; or
  • Negotiate an agreement to use copyrighted work.

Any of these options would be a positive step, showing respect for the creators of copyrighted material, rather than misappropriating their work.

Leave a Reply

Your email address will not be published. Required fields are marked *