BioPerl is an excellent package

I’m taking a Bioinformatics Algorithms course here at the University of Maryland. Our first real programming project is quite the doozy. We have to implement our own Smith-Waterman local alignment algorithm, which matches up a given amino acid sequence against a given database of other sequences. The basic algorithm uses dynamic programming and a huge matrix that has as many rows and columns as the two sequences are long. Add on top of that the large number of heuristics that can (and some should) be used to improve the alignment, and it’s quite the programming project.

Luckily, there’s BioPerl. BioPerl is an amazing set of modules for Perl that does just about anything one would want to do in Bioinformatics. Of course, it has a module implementing local alignment (and even BLAST), but we aren’t allowed to use those on our project. Still, it has lots of other modules that we can use. It has classes for storing DNA, RNA, and amino acid sequences, scoring matrices (such as Blosum62), subroutines for reading and writing from a large variety of formats (including Fasta, of course), and much, much more.

We’re given our choice of any programming language to use to complete the assignment, something I’ve never seen before in a class (rarely they’ll give you a choice of two or three, but never “any”). Yet I don’t envy anyone who isn’t using Perl. Yeah, Perl isn’t an optimal language for this kind of thing, especially because, as a scripting language, its performance is much lower than, say, C++. But it has BioPerl. Perl is the most widely-used programming language in the field of Bioinformatics.

The only problem with choosing to use Perl is that I’m not going to win any of the extra credit points. The teacher is giving significant extra credit points to the top three algorithms (measured in time it takes to complete a run), but, by selecting Perl, I’m completely removing myself from the running against the people who are using C/C++. It just won’t even be close. It might be more fair if the programs were ranked against others in the same language, because then there might be some incentive for me to optimize my heuristics, but since even the best heuristics aren’t even going to place using Perl, I have no incentive to do anything beyond the very minimal algorithm.

Comments are closed.