News broke recently that an unnamed AI company was preparing to pay Reddit $60 million a year to be allowed to scrape the content off the popular website. 1 The company, rumored to be Google, and the deal, according to Bloomberg Law is “…the first major public licensing deal between a US social media giant and an external AI company. But other agreements are expected to follow given the massive troves of data the platforms could provide to AI companies, and the critical importance of such diverse data for training their large language models, copyright and tech attorneys say.” 2
The first question that popped into my mind was “just how did Reddit do this”? They certainly don’t own the content posted by users, including my blog posts which have appeared there, as we shall discuss further on.
In order to own the copyright in the user posts, Reddit would have to have a written copyright assignment from each and every poster, for each and every post. This is a requirement of Section 204 of the Copyright Act which provides:
“A transfer of copyright ownership, other than by operation of law, is not valid unless an instrument of conveyance, or a note or memorandum of the transfer, is in writing and signed by the owner of the rights conveyed or such owner’s duly authorized agent.”
As usual, the rationale is buried in the Reddit Terms of Service or TOS.
The Reddit TOS provides that the users “retain any ownership rights” over the content that they post (how generous of them). This is a foregone conclusion given the lack of a written copyright assignment.
BUT, (here’s the kicker) you
“grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world.”
Since the license is “non-exclusive,” unlike the copyright ownership transfer, it does not have to be in writing and can indeed be oral.
And since the license is “royalty free,” Reddit does not have to share with you any part of the $60 million a year they’ll be raking in.
Don’t like it? Too bad. The license is also “irrevocable.”
But what about my blog posts that appear on Reddit?
I am not a user or member of Reddit. I have never posted any of my blog posts to Reddit. But other people do. Without my permission. One of the features of my blog platform is it shows who links to my posts, be it Facebook, Twitter, WikiPedia, and in this case Reddit. Indeed, I knew I had “arrived” as an Internet blogger when somebody called me a “moron” on Reddit.
But I digress.
So, I have never agreed to the Reddit TOS, and therefore Reddit cannot use it as a basis for allowing Google to scrape my posts. Now, I am certain that there are plenty of other bloggers out there in the same boat as me. Indeed, Reddit has an entire sub-reddit called “TIL” which is short for “today I learned” which is full of other people’s blog posts.
So now that we have established that my work appears on Reddit, without my consent, without my agreeing to the TOS and this content has value (indeed $60 million a year), what might my rightful share of the license fee be?
I mean we are probably talking Spotify numbers here, but clearly I am owed something. And for every other AI company that strikes a deal with Reddit, I should be owed something as well.
And if the company is indeed Google, what exactly did they think they were buying for their $60 million a year? They have loads of very smart copyright lawyers working for them.
But in the end, the trenchant point is this:
We have now established that the data scraped off the internet to train AI has value. And the current payments for that valuable data by all of the AI companies in business is currently $0.00.
I would say that we, the content creators, are being vastly underpaid.