Wikipedia is under assault: rogue users keep posting AI generated nonsense

Open platform + Open access to AI = Open season for mischief makers

10 Oct 2024, 21:14 by Cal Jeffrey · TechSpot

Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.

This is why we can't have nice things: Wikipedia is in the middle of an editing crisis at the moment, thanks to AI. People have started flooding the website with nonsensical information dreamed up by large language models like ChatGPT. But honestly, who didn't see this coming?

Wikipedia has a new initiative called WikiProject AI Cleanup. It is a task force of volunteers currently combing through Wikipedia articles, editing or removing false information that appears to have been posted by people using generative AI.

Ilyas Lebleu, a founding member of the cleanup crew, told 404 Media that the crisis began when Wikipedia editors and users began seeing passages that were unmistakably written by a chatbot of some kind. The team confirmed the theory by recreating some passages using ChatGPT.

"A few of us had noticed the prevalence of unnatural writing that showed clear signs of being AI-generated, and we managed to replicate similar 'styles' using ChatGPT," said Lebleu. "Discovering some common AI catchphrases allowed us to quickly spot some of the most egregious examples of generated articles, which we quickly wanted to formalize into an organized project to compile our findings and techniques."

For example, There is one article about an Ottoman fortress built in the 1400s called "Amberlisihar." The 2,000-word article details the landmark's location and construction. Unfortunately, Amberlisihar does not exist, and all the information about it is a complete hallucination peppered with enough factual information to lend it some credibility.

The mischief is not limited to newly posted material either. The bad actors are inserting bogus AI-generated information into existing articles that volunteer editors have already vetted. In one example, someone had inserted a correctly cited section about a particular crab species into an article about an unrelated beetle.

Lebleu and his fellow editors say they don't know why people are doing this, but let's be honest – we all know this is happening for two primary reasons. First is an inherent problem with Wikipedia's model – anyone can be an editor on the platform. Many universities do not accept students turning in papers that cite Wikipedia for this exact reason.

The second reason is simply that the internet ruins everything. We've seen this time and again, particularly with AI applications. Remember Tay, Microsoft's Twitter bot that got pulled in less than 24 hours when it began posting vulgar and racist tweets? More modern AI applications are just as susceptible to abuse as we have seen with deepfakes, ridiculous AI-generated shovelware books on Kindle, and other shenanigans.

Anytime the public is allowed virtually unrestricted access to something, you can expect a small percentage of users to abuse it. When we are talking about 100 people, it might not be a big deal, but when it's millions, you are going to have a problem. Sometimes, it's for illicit gain. Other times, it's just because they can. Such is the case with Wikipedia's current predicament.