A journey into the bowels of Wikipedia

Friday, July 11th, 2008

Most people don’t know what’s going on in the bowels of Wikipedia. Be very thankful of that. For the most part, what goes on in the bowels of Wikipedia is thoroughly uninteresting except to those who are right in the thick of it, in which case it’s certainly the most interesting thing ever (or you would have to assume so, judging by how much time is whiled away on it). So to those looking on from the outside, and want to know what’s really behind this Wikipedia thing, here’s an example.

The English Wikipedia has an Arbitration Committee that is tasked with resolving the most serious disputes between users. Arbitration cases work pretty much like court cases in the real world, including lengthy opening statements and discussion by all parties involved, the presentation of evidence, discussion of said evidence, debating, proposal of rulings, voting on rulings, discussion of rulings, etc. The only difference between Wikipedia arbitration and a real court case is that in an arbitration case, all of the onlookers can get into the discussion too, and they frequently do. Imagine a court case where everyone in the gallery is screaming loudly along with every step of the proceedings and you have an inkling of how chaotic and lengthy this can all be.

The Arbitration Committee has about a dozen sitting arbitrators who are the only ones who can vote on the proposed rulings. Recently, one of the arbitrators broke ranks on a case and said that an agreement had been reached in a case following private discussion by the arbitrators. The only problem is, it hadn’t. Another arbitrator logged on soon after and posted a message saying that this ruling had not, in fact, been agreed to by everyone. Much drama and gnashing of teeth ensued, with the most vocal Wikipedians wailing that they had lost total faith in the Arbitration Committee (one wonders if they thought it had been infallible up until that point).

The controversy surrounding this incident grew so big that a separate process, a Request for Comment, was launched on the topic of the Arbitration Committee’s legitimacy. So we’ve gone from a simple user disagreement, to an argument over the user disagreement, to an argument over the argument over the user disagreement. And keep in mind that the user disagreement itself was pretty far removed from the actual purpose of Wikipedia — writing the encyclopedia — by a good deal. Is that enough levels of meta for you? At this moment, the meta-meta-discussion, the Request for Comment on the Arbitration Committee, is 92,500 words long, or about the length of the average fiction novel. And the talk page of the Request for Comment, which is effectively a meta-meta-meta-discussion, weighs in at a decent 32,500 words, or the size of an average novella.

I’m not going to go into any further detail on any of this, because frankly, my eyes are glazing over at this point. You’re invited to read the links I’ve presented, but honestly, there are so many better things you could do in the same amount of time — like read an actual novel. And I haven’t even searched out all of the meta levels — the administrators’ notice boards, the community notice boards, the village pump, etc. All told, on any major controversial issue, roughly five to ten novels worth of text will be spewed forth by all of the participants involved. It’s enough to make any future historian squirm with glee.

I hope you enjoyed (!!) this look into the bowels of Wikipedia. Just be very thankful that you aren’t involved in any of it (or if you are, I’m so sorry). The next time you’re reading an article on Wikipedia, just appreciate that somehow useful things manage to get done even amongst all of this unproductive chaos. Wikipedia in many respects resembles a supermarket in the Gilded Age. Walking along the clean, lovingly arrayed aisles and admiring the nicely presented canned pork products, it seems like a very pleasant place. But don’t dare inquire about how those products are actually made — there’s a whole jungle just beneath that shiny veneer.

The highest-editing zombie bot on Wikipedia

Monday, May 26th, 2008

I stopped actively editing Wikipedia more or less one year ago. Naturally, I haven’t stopped editing completely, as I still read Wikipedia nearly every day in the pursuit of my own edification. But I no longer seek out thankless administrative tasks to perform, nor do I browse articles solely to find a way to contribute some writing. In that way I’m much more like the casual reader who occasionally fixes a typo, though the casual reader also doesn’t have the ability to delete articles, block users, and protect pages (ah, the privileges of being an administrator). But I don’t much use those abilities anymore, so it matters little.

In addition to doing lots of editing and administrative tasks (page may take awhile to load), I also spent a good amount of time hacking on programs for Wikipedia. Some, such as the userbox generator (don’t even ask), were purposefully silly. Others, such as my work on the PyWikipediaBot free software project, were more useful. In addition to my work on that bot framework, I wrote quite a few bots, which are programs for making automated edits. By the time I (mostly) retired from Wikipedia, I had put many hours into those bots, and I couldn’t bear to just shut them down. So I left them running. They’ve been running now for over a year, unattended for the most part, and have been remarkably error-free all things considered. I have variously forgotten about them for months at a time, and only remembered them when my network connection chugs for an extended period of time (long “Categories for deletion” backlog) or when my server’s CPU utilization pegs (bot process gets stuck in an endless loop). So yes, there is a zombie bot editing Wikipedia, and it even has administrative rights that it uses quite frequently!

All of these bot programs that I wrote run under one Wikipedia user account, Cydebot. That account was the first account on any Wikipedia project to break one million edits. The total currently stands somewhere at a million and a quarter (proof), though it has been out-edited by one other bot account by now. But just think about the enormity of that number. At one point Cydebot had a single digit percentage of all edits to the English Wikipedia. You can’t say that’s not impressive, especially considering how ridiculously massive Wikipedia is. Yet being a bot operator was largely unsung work. The only time I really got noticed for all the effort I was putting into it (and never mind the network resources involved, especially when I was running AntiVandalBot, which downloaded and analyzed the text of every single edit to Wikipedia in real time) was when yet another person thought they were the first to realize that Cydebot was using administrative tools and deemed it necessary to yell at me about it. Wikipedia has this cargo cult rule that “admin bots aren’t allowed” — even though people have been running them for years. I’ll grant that it’s schizophrenic.

So after continuing to run Cydebot for this long, I’m not going to stop now. I haven’t put any effort into Cydebot for over a year besides occasionally updating the pyWikipediaBot framework from SVN, killing pegged bot processes, and rarely modifying the batch files for my bots when someone points out that the associated pages on Wikipedia have changed. I don’t have the time (nor the desire) to put any further serious development work into Cydebot, so at some point things will finally break and Cydebot will no longer be able to do any work. But it’s already gone for over a year performing all sorts of thankless tasks on Wikipedia that no human wants to be bothered with; why not let it continue going and see how much longer my favorite zombie bot can continue at it for?

If you want to track the continuing edits of a zombie bot on Wikipedia, you can do so here. So the next time you are idly reading Wikipedia, remember that, not only are there bots behind the scenes that are making millions of automated edits, but some of them are zombies that have been running largely unattended for months, if not years. Wikipedia is built, in no small part, upon zombie labor.

How to deal with liars on Wikipedia?

Thursday, March 1st, 2007

About a month ago it became known to me that one of the most powerful and influential people on the English Wikipedia, Essjay, had been lying through his teeth for over a year. He claimed to have been a 40-something professor of theology, but it turned out in the end that was he was a 24-year-old with no such degree. I didn’t say anything publicly about it at the time in deference to my sources’ requests, but now that the cover has been blown off on this whole sordid affair, I feel obligated to comment.

The thing is, I could maybe forgive him if he had just made up this alternative identity to keep his own identity anonymous. But he didn’t. He exploited those fake academic credentials to gain the upper hand in content disputes. Kelly Martin already does a great job of covering this side of it, but there’s one thing he said in particular that I’d like to point out:

If you’d like to start a [Request for Comment] on the matter, I’d be happy to offer the community my evidence; I am, after all, one of Wikipedia’s foremost experts on Catholicism. —Essjay, June 23, 2005

He made lots of statements like these to get an upper hand in debates; see Kelly Martin’s blog post for more details. I cannot forgive Essjay for what he has done. He has permanently lost my trust as well as the trust of many others. The sad thing is, Essjay is still active in all of the highest areas of the English Wikipedia; he’s a bureaucrat, checkuser, oversight, and within the past week he was appointed to the Arbitration Committee (usually these appointments are decided by community elections). So what now? Essjay lied to all of us in a particularly egregious manner, and the “punishment” is that he gets appointed to yet another important position? What kind of message does this send about Wikipedia?

This is bad, bad news.

Update: Here’s an even more damning false claim of credentials by Essjay:

I believe the entry to be correct as it reads, and I offer as my reference the text “Catholicism for Dummies” by Trigilio (Ph.D./Th.D.) and Brighenti (Ph.D.). The text offers a Nihil Obstat from the Rev. Daniel J. Mahan, STB, STL, Censor Librorum, and an Imprimatur from the Rev. Msgr. Joseph F. Schaedel, Vicar General. This is a text I often require for my students, and I would hang my own Ph.D. on it’s credibility. —Essjay April 11, 2005

It must be easy to risk “losing” something you don’t even have in the first place.

Also, Larry Sanger (co?-founder of Wikipedia) has written an excellent blog post.

See here for a follow-up post.