As consistent peddlers of inaccurate and meretricious information ourselves, it’s hard not to feel sorry for journalist David Anderson, who in 2008 stated that a group of football fans liked to wear hats made of shoes and sing about a little potato.
He’d got the information from Wikipedia, which has also, over the years, informed its readers that the developer of Windows is called Microshaft, its products are evil and its logo is a kitten.
But a group of University of Iowa researchers is refining a new tool that can detect this sort of vandalism. It’s an algorithm that checks new edits to a page and compares them to words in the rest of the entry, alerting an editor or page manager if something looks a bit odd.
Wikipedia already has tools that detect obscenities or major edits, such as deletions of entire sections. But they’re built manually, with prohibited words and phrases entered by hand, and so are a hassle and easy to get round. They’re also not much use at catching smaller types of vandalism.
The Iowa team ran the Abraham Lincoln and Microsoft entries – two of the most vandalised pages, can’t think why – past their new algorithm. It meant reviewing more than 4,000 edits.
Their algorithm successfully detected most of the minor episodes of vandalism in both the Lincoln and Microsoft entries – successfully spotting, for example, that Pete’s preference for pancakes wasn’t strictly relevant to the life and achievements of Lincoln.
However, it failed to notice that a portrait of Lincoln had been replaced with a photo of a redwood tree – a change that managed to survive for two years and 4,000 edits.
The team’s now working on refining the algorithm to improve its accuracy still further – but we think it’s a crying shame.
We loved seeing Bill Gates depicted with a moustache and horns, and Bernie Madoff described as ‘a very clever asshole who stole shitloads of money’.
And it would have been lovely if Janis Joplin really had speedwalked everywhere and had a fear of toilets, or Billie Piper performed sexual favours for a tenner.