The highest-editing zombie bot on Wikipedia

Monday, May 26th, 2008

I stopped actively editing Wikipedia more or less one year ago. Naturally, I haven’t stopped editing completely, as I still read Wikipedia nearly every day in the pursuit of my own edification. But I no longer seek out thankless administrative tasks to perform, nor do I browse articles solely to find a way to contribute some writing. In that way I’m much more like the casual reader who occasionally fixes a typo, though the casual reader also doesn’t have the ability to delete articles, block users, and protect pages (ah, the privileges of being an administrator). But I don’t much use those abilities anymore, so it matters little.

In addition to doing lots of editing and administrative tasks (page may take awhile to load), I also spent a good amount of time hacking on programs for Wikipedia. Some, such as the userbox generator (don’t even ask), were purposefully silly. Others, such as my work on the PyWikipediaBot free software project, were more useful. In addition to my work on that bot framework, I wrote quite a few bots, which are programs for making automated edits. By the time I (mostly) retired from Wikipedia, I had put many hours into those bots, and I couldn’t bear to just shut them down. So I left them running. They’ve been running now for over a year, unattended for the most part, and have been remarkably error-free all things considered. I have variously forgotten about them for months at a time, and only remembered them when my network connection chugs for an extended period of time (long “Categories for deletion” backlog) or when my server’s CPU utilization pegs (bot process gets stuck in an endless loop). So yes, there is a zombie bot editing Wikipedia, and it even has administrative rights that it uses quite frequently!

All of these bot programs that I wrote run under one Wikipedia user account, Cydebot. That account was the first account on any Wikipedia project to break one million edits. The total currently stands somewhere at a million and a quarter (proof), though it has been out-edited by one other bot account by now. But just think about the enormity of that number. At one point Cydebot had a single digit percentage of all edits to the English Wikipedia. You can’t say that’s not impressive, especially considering how ridiculously massive Wikipedia is. Yet being a bot operator was largely unsung work. The only time I really got noticed for all the effort I was putting into it (and never mind the network resources involved, especially when I was running AntiVandalBot, which downloaded and analyzed the text of every single edit to Wikipedia in real time) was when yet another person thought they were the first to realize that Cydebot was using administrative tools and deemed it necessary to yell at me about it. Wikipedia has this cargo cult rule that “admin bots aren’t allowed” — even though people have been running them for years. I’ll grant that it’s schizophrenic.

So after continuing to run Cydebot for this long, I’m not going to stop now. I haven’t put any effort into Cydebot for over a year besides occasionally updating the pyWikipediaBot framework from SVN, killing pegged bot processes, and rarely modifying the batch files for my bots when someone points out that the associated pages on Wikipedia have changed. I don’t have the time (nor the desire) to put any further serious development work into Cydebot, so at some point things will finally break and Cydebot will no longer be able to do any work. But it’s already gone for over a year performing all sorts of thankless tasks on Wikipedia that no human wants to be bothered with; why not let it continue going and see how much longer my favorite zombie bot can continue at it for?

If you want to track the continuing edits of a zombie bot on Wikipedia, you can do so here. So the next time you are idly reading Wikipedia, remember that, not only are there bots behind the scenes that are making millions of automated edits, but some of them are zombies that have been running largely unattended for months, if not years. Wikipedia is built, in no small part, upon zombie labor.

Anonymous editor gets wrestler suicide scoop on Wikipedia

Wednesday, June 27th, 2007

Here’s quite an amazing edit on Wikipedia, in which an anonymous person, editing from the IP address 69.120.111.23, modifies the article on Chris Benoit to read “However, Chris Benoit was replaced by Johnny Nitro for the ECW Championship match at Vengeance, as Benoit was not there due to personal issues, stemming from the death of his wife Nancy” (editing addition in italics). In case you hadn’t yet heard, Chris Benoit was a WWE wrestler who pumped his seven-year-old son full of non-prescription hormones for several weeks, then killed his wife and his son in a murder-suicide. This edit to Wikipedia was a full half day before these horrific events were reported by police to the media.

So who was anonymously editing the article with information that wasn’t known to the public yet? Chris Benoit himself? One of those friends that he sent text messages to after killing his family but before offing himself? Another edit an hour later by a different anonymous IP address, 125.63.148.173, reveals more information, modifying the article to read “However, Chris Benoit was replaced by Johnny Nitro for the ECW Championship match at Vengeance, as Benoit was not there due to personal issues which according to several pro wrestling websites is attributed to the passing of Benoit’s wife, Nancy” (addition in italics).

So, is it possible that wrestling fansites knew what had gone down a full twelve hours before anyone else did? If so, where did they get this information from? This requires further investigation, and indeed, Wikimedia Foundation Volunteer Coordinator Cary Bass says that he has already contacted the proper authorities. Unfortunately, it won’t be nearly so easy to track this down on wrestling fansites as it would be on Wikipedia, as every edit on Wikipedia leaves an entry in the page’s history that is associated with a timestamp. Unless someone finds some discussions in wrestling forums that are marked with timestamps, anyway.

Update, June 28: Looks like the Associated Press is carrying this story now. Nyah nyah, scooped you guys by a day. Fox News also has a story.

Update, June 29: The anonymous user is claiming this was just an unfortunate coincidence.

Hyattsville websites

Sunday, April 29th, 2007

The world wide web is (surprisingly) doing a lot to bring local communities together. I wouldn’t have ever guessed on my own. It’s just that this phenomenon is so prevalent that I couldn’t help but notice when it happened to me. Here’s what’s up.

Technorati is the largest blog aggregator and tracker on the web. One of its useful services is tracking incoming links. Basically, it will let you know if any of the blogs it tracks link to your own blog. This feature is so useful that it’s integrated transparently into WordPress, so you don’t even have to go to Technorati to see who’s linking to you. It’s always fun to know who’s linking to you and what they’re saying about you. So imagine my surprise when I saw that I had an incoming link from a website called The Hyattsville H4X pointing to my post on the housing crisis at the University of Maryland.

Hyattsville H4X is a weekly hour-long podcast devoted solely to the city of Hyattsville, located in Prince Georges County, Maryland (where I currently reside). I never would have guessed that this medium-sized city would have such a presence online, but it does, and I imagine that increasingly, more and more other communities do as well. So I downloaded and listened to the episode of the podcast whose show notes had linked to my blog, and lo and behold, they were talking about my post and suggesting that Hyattsville would inevitably end up absorbing a lot of displaced UMD students because the city of College Park seems intent on keeping them out.

But this gets back to my original thesis: that the Internet is actually helping to bring local communities closer together. I never would have learned anything on my own about the local government issues I heard discussed on the podcast. Local government is notoriously esoteric: you normally have to go out of your way just to learn what’s going on (let alone affect it). I would guess that easily 95% of people don’t know what’s going on in their local government; very very few would ever go to the City Council and community meetings to find out. But now, with the web, there are people who do it for you, whether it is in the form of blogging or podcasts. Then the average person can simply check a website every once in awhile and they too will know what’s going on. The rise of community-oriented websites is excellent because more involved communities lead to better communities.

After discovering this new podcast to listen to, I decided to check the web for other Hyattsville sites. I had never even thought to search them out prior to discovering Hyattsville H4X; going on to the global Internet to find information on local issues seems counter-intuitive. But it’s an excellent resource. I even found a site called My Hyattsville Wiki. It’s a surprisingly active wiki devoted just to Hyattsville. I suspect this experience isn’t unique to Hyattsville, either — all across the country, thousands of communities are probably organizing and coming together online. So wherever you live, go search online and see what’s out there. You might be pleasantly surprised.

Conservapedia: An encyclopedia for the delusional

Thursday, February 22nd, 2007

Okay, I won’t mince words. Conservapedia sucks. I find the whole idea of a separate encyclopedia based on a narrow viewpoint revolting. If the facts don’t back up all of your positions, maybe you should change your positions, rather than going off to build your own parallel universe (including your own encyclopedias!) where bias rules? It’s scary that the theocrats and the neocons in this country basically live in their own separate world where they don’t have to hear, nay, even think about things they disagree with. They have their own news channel, their own radio stations, their own schools and educational materials (some that proclaim the Earth is only thousands of years old!), their own churches, their own sport (NASCAR), and now even their own collaborative encyclopedia? How can a group of people just decide to collectively disassociate with reality to such a degree? Here’s the mission statement from the front page of their site:

Conservapedia is a much-needed alternative to Wikipedia, which is increasingly anti-Christian and anti-American. On Wikipedia, many of the dates are provided in the anti-Christian “C.E.” instead of “A.D.”, which Conservapedia uses. Christianity receives no credit for the great advances and discoveries it inspired, such as those of the Renaissance. Read a list of many Examples of Bias in Wikipedia.

Conservapedia isn’t an encyclopedia, it’s a propaganda mill trying to disguise itself as an encyclopedia. And failing miserably. Let’s take a look at a quote from their abominable article on evolution:

Ironically, the thory of evolution is often disproved simply by the existence of those who argue against it. Intelligence, indeed the grasp of rote reason, seems beyond even the most articulate of the anti-evolutionists – a startling development, considering the difficulty of survival in modern times. Ostensibly, wolves, liberals and flagrant “street abortionists,” would easily predate upon these huddles masses – however, and perhaps due to the power of their De Jesus, many do not meet their ends this way; allowing thus for the continued distribution of their holy seed.

What the hell is that?! Is it supposed to be a coherent argument? Is it really supposed to be an encyclopedia article?! Why do blatant hostility and utter ignorance towards science always seem to go hand-in-hand with conservatism? Update 2007-02-25: I should’ve caught it earlier, but this section was a joke. Here’s a look at the opening section to the article on George Washington:

George Washington (1732-1799) was unanimously elected President of the United States of America and the Commander-in-Chief in the Revolutionary War! He was also a devout Christian, with his adopted daughter once stating that if you question Washington’s faith you may as well question whether or not he was a patriot!

Washington is perhaps the person other than Jesus who declined enormous worldly power, in Washington’s case by voluntarily stepping aside as the ruler of a prosperous nation! His precedent of serving only two terms was then voluntarily followed for 140 years!

Washington frequently invoked Christianity in his work! As General, he commanded that chaplains be included in every regiment: “The General hopes and trusts, that every officer and man, will endeavour so to live, and act, as becomes a Christian Soldier, defending the dearest Rights and Liberties of his country!

Notice how every sentence ends in an exclamation mark! Every single freaking sentence in the article! It even goes on for a few more paragraphs which I’ve mercifully decided not to quote here! It’s like the writer is in a perpetual state of amazement and awe! Even over mundane facts! Note that this article doesn’t really talk about George Washington too much, but rather, uses it to advance propaganda about George Washington’s religion!

One final thing that upsets me on a very deep free content movement level: there is no licensing information anywhere. I guess using an open source content license is too “Communist” for them, but the alternative, not using a license, is even worse. I have no idea what legal grounds they’re on with this, but it’s likely they’re setting themselves up for a world of hurt because none of the people contributing content to Conservapedia are releasing their work under any sort of license. It’s questionable if it’s even legal for Conservapedia to be redistributing and modifying this content at all. They could try to pull some draconian nonsense in the future claiming that they own all of the content that was submitted to them, but without an explicit release, that’ll never fly. And with an explicit release, who would bother contributing? Why would you want to work for free so someone else, who merely hosts the damn thing, gets all of the benefit from it? The beauty of Wikipedia is that you retain copyright to every contribution you make. It’s your’s. It is licensed so that other people can use it, modify it, and redistribute it under the terms of the GFDL, but you are still the ultimate owner. That is why Wikipedia has been so successful, and why Conservapedia is likely to fail (well, that and how ludicrous the whole concept is).

Addendum: Lots of people have been covering this Conservapedia lunacy. Here are some good links: