top of page
  • Writer's pictureJake Browning

Language Models and the "Inevitable" Flood of Misinformation

Updated: Dec 14, 2023

If there has been a constant trope since GPT-2 it is that large language models (LLM) will soon flood the internet with misinformation. There are, of coure, reasons to be concerned: StackOverflow, for example, was inundated with false answers generated by ChatGPT. And we are beginning to see the poisoning of Google's search, as increasingly mistaken content gets flagged. This is especially pernicious with images, since a model like DALL-E asked to create the Mona Lisa is designed to produce an image which would be captioned as "the Mona Lisa" by an AI image recognition system. There will likely be cases where something like a DALL-E Jasper Johns will be more Jasper-y than the original.

But the misinformation flood hasn't emerged. Might it soon? Will LLMs soon flood us with spam and bullshit misinformation? We should be skeptical. First, not all misinformation is created equal; simply because some generated nonsense shows up on some website's blog does not mean anyone will read it. For misinformation to be read, it needs distributors, and many distribution sources--such as Twitter--require enormous efforts on the part of a user to build a following. Plenty of writing on the internet--both good and bad--is never read.

Second, even if someone can distribute it, the succes of a piece of misinformation depends not on the supply but on the demand. Someone needs to want that misinformation. Consider a blog post on the importance of blasting power ballads at 2am for getting cicadas to emerge a year early in their cycle. Kind of funny piece of misinformation, one that would be annoying to neighbors, but probably not much of an audience for it. Even fewer people would do anything with the information, since few people are hoping to coax cicadas out, after all. For most, they might note the headline and then promptly forget it or treat it as parody.

How might LLMs fit in here? If they start flooding the world with misinformation no one wants, then they won't much matter. If they also start distributing the scammy nonsense through bots on X (as they are doing), then people will largely ignore them and they'll be blocked or banned. In order for them to be actively harmful, someone generally trusted and reliable would need to disseminate their content, and the content would need to be in high demand.

Both have proven problematic. A trusted and reliable distributor who puts out bullshit will, in the process, cease to be reliable--and, in turn, probably lose any trust they've accrued when caught. This just happened to Sports Illustrated, where the CEO lost their job for it. Trusted sources will only remain trusted if they avoid LLM generated content.

In the other direction, high demand misinformation is already abundant in the world. But misinformation typically depends on readers, and most (mis)information is consumed by a small number of high dosage readers who are caught up in it. Most people don't read multiple articles a day, but the ones who do tend to be obsessive junkies. And obsessives are surprisingly selective about what misinformation they believe; they won't belive just anything. If the misinformation machine screws up and combines high salience words in the wrong way--suggesting, for example, that Hunter Biden coordinated with Donald Trump to help get Zelensky elected--you won't get a lot believers. There is a lot of misinformation involving all of these figures, but you can't just mix-and-match it.

It is also worth noting that, as a result of these two considerations, there is already a wide variety of trusted sources that are "unreliable" in a way that appeals to their audience, such as Alex Jones. These figures have recognized that some people really crave misinformation of the right type, such as conspiracy theories about the government or (for you leftists out there) about "big pharma" and "big agriculture." This allows influencers to sell a lot of products to fight off the collapse of society, prevent COVID without vaccines, and protect our bodies from gluten. But while some of the conspiracies are nonsensical, the audience isn't; there are plenty of people who want unconventional solutions that appeal to their sensibilities, and where there is an audience there is someone making money off them.

None of this is to absolve LLMs. But it does mean they aren't a terribly efficient form of misinformation: it'll take hand-holding to make sure they produce high demand content, and it'll take effort to build up a readership. While an LLM might make a con's game easier, it won't come cheap. If there is a simple way to rapidly build up an audience and make in-demand content for them, someone has already exploited it. The LLM can't help with that.

What about other forms of generative AI? There are problems here, but the difference in degree, not kind; Dan Rather was burned by a fake document, not a deep fake. The Protocols of Zion spread their insidious claims using pamphlets. And word-of-mouth was responsible for some of the worst violence in history. New technologies amplify this ability, but there still needs to be credulous parties who want to believe--who are effectively looking for an excuse to resort to violence. In these cases, appeals to misinformation are dangerous, but the truth is as well; the riots on September 11, 2012 were caused by an old movie that some Imam decided to dredge up and call for a day of rage on. A fake video or image might spread faster than an article, but it is still depending on human agents who find it useful to their own social ends.

For these reasons, I think we shouldn't fear the destruction of our information ecosystem. Increasing the supply of misinformation is, on its own, largely irrelevant; it fundamentally depends on demand. And, if you're aiming for appealing to people, it is still easier and cheaper to use humans. Expanding the reach of misinformation, moreover, still depends on building up an audience, something that is very expensive, time-consuming, and difficult to automate. Generative AI can help around the margins, but it isn't changing the fundamental dynamics. The problems of AI will be elsewhere.

Recent Posts

See All

Why isn't Multimodality Making Language Models Smarter?

Philosophy has something called the "symbol grounding" problem. The basic question is whether the words and sentences of language need to be "grounded" in the world through some sort of sensory relati

AI Doom and the Insurance Argument

In Cixin Liu's The Three-Body Problem, it is discovered that an alien civilization is on its way to Earth, intent on conquering the planet in order to secure its species' survival. Suppose such a thin

Gemini, Grok, and the Crowded Field of Language Models

Tourists on a guided program in Europe often come up with a simple complaint about the area, summed up as ABC: Another Bloody Church. Although each is the pride and centerpiece of their city, they oft

bottom of page