top of page
  • Writer's pictureJake Browning

Copyright and Generative AI

The recent wave of cases concerning generative AI, such as Silverman v OpenAI, have not gone well. From a legal perspective, this isn't surprising. The plaintiff couldn't definitively show their book was part of OpenAI's training data in the first place, and they never bothered to show any direct copyright infringement. They certainly couldn't show the plaintiff was harmed by the scraping and training. So it was pretty weak. The NY Times case, by contrast, does show a potential harm. Since ChatGPT can reproduce content behind a paywall, people could potentially read content for free that they would otherwise have had to pay the NY Times for. How much money is a real question, but it is a clear case of harm, which helps it from a legal perspective.


But the NY Times also highlighted some of the broader social implications of generative AI: these programs might eat into the news making business itself, and thus threatens the Times's journalism more broadly. The idea is that, if ChatGPT becomes good enough, then the Times wouldn't be able to afford the journalists needed to report and investigate breaking news. This isn't a terribly compelling legal case; new businesses cut into old businesses all the time. And it is a premature worry, given how much current models hallucinate.


But it does raise the right kind of question: legality aside, how should we be evaluating generative AI on copyright matters? What is even the relevant metrics for thinking about this issue? These are important questions that haven't received sufficient thought, in part because people often divorce these discussions from the underlying morality of copyright law.


Moral and Legal Rights to Property


There is a conceptual tension inherent to copyrights. On the one side, as Locke pointed out, people who put their time and effort into a project feel a right to it--that the improvements they put into it make it their property. Hegel went further here, suggesting investing time and effort means putting something of yourself into it--that the product is an expression of who you are. This latter idea feels especially strong in the case of intellectual property: the words express your thoughts, the statue embodies your vision. It really seems like a part of you, and it feels wrong for anyone else to claim it or benefit from it (see Marx on alienation).


But there is another dimension: the law need not recognize you as the property owner, regardless of how much effort you put into it. What legally counts as private property is a political issue, one determined by balancing trade-offs between encouraging creation and benefiting society. Think about a patent on life-saving medicine: is this a good policy? If you do protect patents and prevent other companies from making generic versions, you'll (supposedly) encourage companies to invest in research and development. But, on the other hand, that same policy means people will die because they can't afford it. Society needs to balance these.


The same thing applies to intellectual property and granting copyrights. We want to encourage people to paint, write, and make movies. And, if they're good, we want them to be able to make enough money off it to make more art. But we also want other people to access, use, and re-use the art. If someone makes a movie, we want people to meme it; if a music video, we want kids to be able to dance to the song on TikTok; if a song, we want Weird Al to parody it and for a garage band to cover it.


This is the moral idea behind "fair use": intellectual property should be generative, and that means people should be allowed to access it, engage with it, critique it, transform it, and even mock it. There is a balancing act at work between two moral interests: how do we ensure desert, such that artists get the rewards they deserve, as well as distributive justice, such that intellectual property is widely accessible in society and is not simply hidden away by the rich? Copyright law needs to protect both interests.


We don't have a good copyright law, however, because ours currently overly protects artists--or, at least, the people owning the art. It should morally offend us that Michael Jackson songs will still be under copyright protection in 2070 and every dime will be going to Sony. That is immoral; it is a injustice that more art isn't entering the public domain. Even the founding fathers would be offended; they only granted 14 years (and they mostly ignored copyrights and patents from other countries).


Worse, current copyright law doesn't particularly protect smaller artists; they are mostly incapable of the cost of lawsuits and lack resources for takedown requests. As such, current copyright law is mostly a method of defending big corporations capable of buying up old copyrights and mooching off the rents. It is a failure of social policy.


Generative AI and Copyright


Where does this leave generative AI? It makes it important for the court cases to be focused. The court isn't going to protect the artists from people using their art. If someone decides to train a copy a library book and keep the copy on their shelf without lending it out, that's not a violation of copyright law. This is roughly akin to recording a pay-per-view event or copying a VCR from Blockbuster: distributing that recording is illegal but watching it at home isn't.


So someone with a generative model at home who isn't distributing the contents isn't doing anything with wrong from a copyright perspective. Nor is a company producing a generative AI that never reproduces copyrighted content, since that is a pretty clear "transformative" case of doing something new with the material. The problem is if someone puts a model, like Midjourney, online that can consistently recreate copyrighted content--especially if they charge money for the use of the model. In the latter case, there is a good (but not open-and-shut) case for arguing harm, since the artist isn't being compensated for their work.


This same issue matters a lot for writers in their fights against products like Google AI Search. If Microsoft Bing tries to summarize a NY Times recipe from the Cooking section and, in the process, exactly reproduces part of that content, then this is a clear case of infringement. Microsoft Bing and Google AI Search are designed to make money by keeping people on the search engine, rather than travelling to the original webpage (which is both behind a paywall and has advertisements, allowing NY Times to make money off the article in multiple ways). What Microsoft and Google are doing is not just copyright infringement but is designed to benefit from copyright infringement.


Does this mean we need to shut down generative AI? Well, not quite. Most generative content is original. I don't think it is very good. But, when taken broadly, only a tiny fraction of a percent is strict plagiarism--and, on the whole, people don't seem to be using it to plagiarize. There is a danger of style theft, where an AI imitates the style of a Patreon artist or some fanfic writer. There isn't much legal recourse here, but from a societal perspective it would make sense to push organizations to prevent prompts requesting "a pic in the style of ____" for artists still under copyright. And some companies have pushed in that direction, recognizing how their models might harm artists.


But we should be careful not to overshoot. While we should protect small artists, we shouldn't overcompensate by making "protect artists" the mantra of copyright law. This is mostly corporate PR, as when critics of the estate tax trot out the "family farm" argument. This is garbage; the estate tax almost entirely protects millionaires and billionaires, not family farms. Similarly, copyright law barely protects small artists or novelists; if you're that small, most people aren't trying to steal your stuff. The focus on small artists is a bait-and-switch. "Strong protections for artists" isn't really about protecting them (nor will it, since they can't afford to sue). It's really just protecting Disneyland, Universal, and Sony.


We also need to be mindful that novel art forms are always contentious. Pop art often involved theft, and many artists rely on existing brands--fanfiction, anime art, cosplay, and so on. At the moment AI art isn't very impressive; people write a prompt and the AI does the work. But, eventually, there will be more direction and interaction: the artist may use a prompt to design a character, then upload it to a virtual world, position a stick figure through a couple key movements, and then allow the AI to do the work of rendering it all.


For a lot of people long on ideas but short on the connections and funding to make them into something, the technology might offer them the ability to make something great but weird. Things like South Park and Clerks are a reminder of how a little money but a good script can make incredible, groundbreaking work that studios would probably not fund.


NY Times v OpenAI


Returning to NY Times, what does that mean? This is more interesting to consider. Copyright infringement should be stopped, but how should a judge classify infringement? Training probably won't classify because there is no material harm from merely recording or compressing data; it is the distribution that is illegal. It might strike many as immoral to train on copyrighted content, but most private property laws offend someone's morals since there are competing moral concepts at stake.


The dangers of current generative AI reproducing copyrighted material, however, is real and worth weighing. If the models get better at avoiding hallucination, they really could harm journalism. But it would also be a harm to cut off a transformative technology most people are using in novel ways, rather than merely to reproduce paywalled content. So the consequences of losing in both directions are pretty severe, and there are precedents in both direction--see Lee and Grimmelman's great explainer on the issue.


As such, the most likely option is a settlement. Which sucks. Settlements will mostly allow for rich firms to profit off of copyrighted material, while small users are largely left outside. They also allow big corporations to collaboratively agree on what the "new" copyright landscape looks like--as with YouTube's recent settlement with music distributors to prevent AI-made songs featuring the voices of certain artists. This allows corporations to circumvent the social interests of the public, effectively ensuring copyright is solely about rewarding copyright owners and ignoring public accessibility or re-use of art.


The real goal should be legislation that balances different objectives: protect small artists while also unleashing everyone to create cool new stuff, like memes and new songs and art. This may also involve protections against creating "generic" versions of cool art, such as partial reproductions of something that already exists. There should also be protections against people just reproducing content. Maybe licenses attached to specific datasets would be appropriate. Whatever it is, though, it should be decided publicly, rather than behind closed doors by corporate copyright holders. They are simply trying to maximize rents, not ensure wide available of art and literature. That isn't to small artists or society's interests.

8 views0 comments

Recent Posts

See All

Critical Comment on "The Psychology of Misinformation"

The Psychology of Misinformation, by Roozenbeek and van der Linden, is an excellent book. Short, balanced, readable. The authors are also remarkably honest with the faults of their views and, while so

Why isn't Multimodality Making Language Models Smarter?

Philosophy has something called the "symbol grounding" problem. The basic question is whether the words and sentences of language need to be "grounded" in the world through some sort of sensory relati

bottom of page