My experiences as an open source developer

In the second part of a series on my experiences that are not widely-shared and hopefully interesting, I will talk about my experiences as an open source software developer (the first part of this series was about being a newspaper columnist).

I am a developer on the Python Wikipedia Bot Framework, which is a collection of programs that perform automated tasks on wikis based on the MediaWiki software. Wikipedia and other Wikimedia projects are by far the largest users of MediaWiki, but there are lots of other ones out there too, and pyWiki is used by lots of people for various tasks.

Before I go over my experiences in-depth, I’ll start with an overview of everything I’ve done on pyWiki. Skip ahead to after the list if these details are too technical.

  • delete.py – A bot that deletes a list of pages. I wrote it from scratch.
  • templatecount.py – A bot that counts how many times any given number of templates are used. I wrote it from scratch.
  • category.py – A bot that handles category renaming and deletion. I made some changes to it and some libraries, catlib.py and wikipedia.py, to make it more flexible, more automated, and to handle English Wikipedia-specific “Categories for discussion” (CFD) tagging.
  • template.py – A bot that handles template substitution, renaming, and deletion. I made some changes to it (and the library pagegenerators.py) to handle operations on multiple templates simultaneously, as well as increasing flexibility. I will admit, I added the capability to delete any number of templates in one run with the hope that I would some day be able to use it on userboxes.
  • replace.py – A bot that uses regular expressions to modify text on pages. I modified it to handle case insensitive matching, amongst other things.
  • wow.py – An unreleased bot that I used to anonymize thousands of vandal userpages to prevent glorification of vandalism. I wrote it from scratch.
  • catmove.pl – A metabot* written in Perl that parses a list of category changes and does them all in one run. I wrote it from scratch.
  • cfd.pl – An unreleased automatic version of catmove.pl that pulls down the list of category changes directly from the wiki, parses them, and executes them, in one single command. I wrote it from scratch. Hopefully I will be able to release it soon (it may have some security issues that I want to make sure are entirely resolved first).

Cfd.pl is the “secret sauce” that lets Cydebot do its magic. To date, Cydebot has over 160,000 edits, most of them category-related. I attribute this to cfd.pl, which allows me to, with a single command, handle dozens of CFDs simultaneously, whereas people using other bots have to input each one manually. It’s no surprise that everyone else pretty much gave up CFD, leaving my highly-efficient bot to handle it all on its own.

I also had some involvement with Vandalbot, which is a Python anti-vandal bot that uses pyWiki. I ran a Vandalbot clone called AntiVandalBot off of my own server for many months, until somewhat recently, when AntiVandalBot was switched over to being hosted on a Wikimedia Foundation server. If you add up all of the edits committed by both Cydebot and AntiVandalBot then I have the highest number of bot edits on the English Wikipedia — of course, it’s not just my work. I merely came up with the account name and hosted it for awhile; Joshbuddy is the one who actually wrote the vast majority of Vandalbot, and Tawker is the one who hosted the original Tawkerbot2 for awhile (and who now hosts it on the Wikimedia Foundation server).

Working on an open source project is very fun, and rather unlike a programming job for pay in the “real world”. For one, it’s entirely volunteer. I work at my leisure, when I feel like it, or when I have a functionality that I or someone else needs. Programming can actually be relaxing and cathartic when there are no deadlines and I am undertaking a coding project simply for the sake of writing something.

All of the developers on pyWiki are very relaxed and they definitely “get” the open source movement. There’s no expectation of anyone having to get anything done. This can have its downsides, in that it might take awhile for something to be taken care of, but it also doesn’t scare anyone off who is worried about a large time commitment. To become a developer on pyWiki, all I had to do was ask the project head, and I was given write access to the CVS repository within a few days, even though I had never used Python before. The amount of trust is very refreshing, and I definitely feel an impetus not to let the other guys down by uploading code with bugs in it (so my testing is always rigorous).

There gets to be a point with computer languages where learning another one is simply no big deal. I wouldn’t want to estimate how many languages I’ve used by now, but it’s probably somewhere around a dozen. After the first few, though, one simply knows how to program, and learning another language is a simple manner of looking through the documentation to find the specific code to do exactly what one has in mind. That was the situation I was in with pyWiki; although I had never used Python before, I knew exactly what I wanted to accomplish and how to accomplish it: I merely needed to know how to do it in Python. Within a week I was hacking away at the code, adding significant new functionality. It should be noted that working on an existing project in a new language is much, much easier than trying to make something from scratch in a new language.

I would say that pyWiki is a medium-size open source project, which is probably exactly the right size for a first-time developer. It’s not so small that it ever goes stagnant; there are code changes submitted every day, and the mailing list is very active. Any reasonable message posted to it will get a response within a day, if not hours. On the other hand, pyWiki is not too large. It has no barriers to entry; anyone can get started hacking right away it and submitting code changes. Larger projects necessarily have large bureaucracies (large projects need to be managed, there’s no way around it), which means there’s an approval process for code changes, and it’s unlikely that anything written by a novice will actually end up making it into a release. Trying to work on a large project right off the bat can be disheartening because there’s very little one can actually do that doesn’t require an expert level of knowledge. Compare this to pyWiki, which lacks lot of functionality that even a novice would be able to code up (delete.py wasn’t hard at all; it’s simply that no one had done it yet).

I would encourage anyone who is interested in programming to find an open source project they enjoy that they can contribute to. It’s great experience, and it much more closely resembles what actually happens in industry than hacking away on a solo project. I’m sure it’s a great resume line item. The key is to find a project you want to work on because it benefits you. In my case, I was writing new functionality that I needed to get stuff done on Wikipedia.

And there’s just something very compelling about contributing to the worldwide corpus of free software. It’s a way to leave your mark on the world — I way to say, “I was here.”

*I don’t know if anyone else uses the term metabot, or if it’s just something that I made up. My definition of metabot is a bot that uses other bots to do the intended work. Cfd.pl is a good example. Cfd.pl is a fully-functioning Perl script that parses through a list of category operations to make and then invokes pyWiki’s category.py to handle each individual operation. I could have written it in Python and called the pyWiki libraries directly, thus making it a normal bot and not a metabot, but I was much more comfortable with Perl at the time, and decided that it would be easier for me to make a metabot.

Update two or three hours later: In hindsight, I’m realizing the level of technical proficiency required for reading the post is incompatible with the stated goals of conveying my experiences as an open source developer to people who don’t actually know what it’s like. This post is too technical, and the people who understand it are going to be the people who already have experience in this. I’ll take another stab at this topic soon.

2 Responses to “My experiences as an open source developer”

  1. I’m in the right field | Cyde Weys Musings Says:

    […] for a great resume line-item. The prospective employers I talked with were impressed that I have open source development experience, significant Java development experience working in a group, and that I have three years […]

  2. In search of stream-based desktop metaphors | Cyde Weys Musings Says:

    […] snarfs up RSS feeds. And I haven’t yet found a good way to track, say, SVN commits to the pyWikipediaBot project, so I’m stuck with getting a new email on every commit. Brilliant. Notice the problem here? […]