New anti-spam measures on this site

Friday, July 25th, 2008

I’ve switched from Akismet to Defensio for my spam-stopping needs here on this blog. The change should be transparent. If anything goes wrong, like if your comments are all of a sudden getting marked as spam, you know how to contact me.

I’ve made the switch to Defensio because I’ve heard some good things about it, and decided to give it a whirl. Akismet definitely wasn’t doing the best possible job, so hopefully Defensio will fare better.

WordPress continues delivers cutting edge features

Thursday, July 17th, 2008

I know I’ve been critical of WordPress in the past, but the new release of WordPress 2.6 allows me to pause and give thanks for all the amazing features that WordPress offers. Earlier tonight I was helping a friend with her Blogger.com blog, and the difference between that and WordPress is night and day.

For instance, Blogger doesn’t even offer out-of-the-box support for below-the-fold text, and the official work-around they suggest is an ugly display:none; CSS hack. Yeah, that’s right, the full text of every post is always included on the main page — there’s just a CSS directive to the browser to hide it! Talk about inefficient! WordPress does it the correct way. And the stylesheet support Blogger has is just hideous. The full text of the stylesheet is included inline with the HTML header on every page. If you don’t believe me, just view the HTML source of this random Blogger blog. They’re all like that.

So compared to Blogger, WordPress is incontrovertibly amazing (and although my friend isn’t likely to want to get server space and administrate her own blog, I would at least recommend moving her blog over to WordPress.com). But the new version of WordPress, 2.6, adds a killer feature that I’ve long wanted in my blog software but haven’t seen anywhere: an integrated revision control system. If you’ve ever read Wikipedia and viewed the history tab, you’ll know what I’m talking about.

Revision control is useful for single author blogs, where you might wipe out a passage only to later wish you had it back. It also helps a lot when there’s some tricky formatting you want to get just right. Without a revision control system, there’s no way to revert to a known good version without first copying the post source into Notepad. But it really shines for multi-author blogs. I remember how, when I was writing for Supreme Commander Talk with Grokmoo, we would edit each others’ posts, and then have to explicitly have a chat about what things in each other’s work needed editing so that the mistakes might not be repeated again. With proper revision control, just execute a diff and what’s changed is plain as day! It always slightly irked me that other people might be editing my words and I would never be able to know. With WordPress 2.6, that’s no longer possible.

So I’ll count my blessings with WordPress. Despite its security vulnerabilities (most of which seem to be passed now) it really is a great piece of software, and the developers continue to add amazing new must-have features to it. Now that I’ve had some experience with another blogging platform, I can unequivocally say that I heartily endorse WordPress. Everyone’s blogging experience should be this smooth.

Authors are the only ones to see the fairer side of copyright

Wednesday, July 9th, 2008

Most netizens agree that copyright is pretty horrifically broken. It lasts far longer than it has any business to, its length keeps getting extended, it’s way too restrictive, and it benefits major corporations a lot more than it does individuals. Most of copyright’s terrible public image has come from the music industry and the movie industry (thank you RIAA and MPAA!). What it hasn’t come from, for the most part, is the book industry. Here’s why.

Across the entire book industry, authors retain the copyright to their works. That’s why they can pick up shop and re-release all of their older novels with a new publisher at the drop of the hat. This ever-present threat is what keeps the publishing industries in line: they have to give the authors good deals or else they’d lose all of their authors. It’s also how JK Rowling can make a billion dollars off a single series of seven books — she retains all of the rights to the series, so when a licensing deal is made to make a movie, she gets the money. You ever heard of a musician or a filmmaker pulling that off? Hell no! Musicians only get paid a pittance from their albums; the majority of the profit comes from concert touring. Filmmakers also don’t make (that) much off their work.

The reason for all of this? In the music and film industry, the publisher gets exclusive publishing rights. There are many musicians out there who’ve switched labels and who can’t legally sell their old music, because someone else owns the rights to it! The same thing happens in the film industry. How absurd is that? The “game” in those industries is so rigged that just to be able to play, you have to give away all ownership rights to something you came up with. That’s how broken copyright is that it allows this to happen.

Now granted, the novel is very much the production of a single person (and a few editors), whereas an album generally has a larger production staff, and a movie especially is made by a lot of different people. It’s harder in the latter two cases to argue that the work is completely owned by a small group of people, especially in the case of a movie studio ponying up hundreds of millions of dollars to produce a movie. But many indie films that are entirely self-financed still get the same raw deal, with their creators having to give up exclusive rights just to get them shown in major theaters. It’s a travesty of copyright.

So go out and support the book publishing industry when you have a chance, as it’s actually not rotten to the core and the authors retain ownership and earn a decent percentage of the profits. As for music and movies, well, you should treat those publishers the same way they treat their creative talent: by royally screwing them over.

This post was inspired by the recent move of the blog The Loom between webhosts for the third time. During each move, the author has retained full ownership of all of his posts, and has been able to move all of them forward to the new host. Yet such a thing in the music industry, simply moving all of an artist’s back catalog to the next label without having to draw up a lot of contracts and pay out a large sum of money, is unthinkable.

Update: After some further research and discussion with friends, it looks like I’m mostly wrong about contracts in the writing industry being more lenient than in the other industries. Dammit. Looks like I got the wrong idea from only considering people who’ve managed to negotiate good contracts. Check out some more book deal contracting issues at this link.

Don’t ever be ashamed of your code

Friday, June 13th, 2008

Are you ever ashamed of your code? Don’t be! Being ashamed of your code is harmful, as artfully explained by Ben Collins-Sussman. It’s better to make your mistakes in the open where they can be quickly corrected than in private where they can fester for months, even years. Note that we aren’t necessarily talking about open source code here. Being ashamed of your code could also mean not sharing code with other people at your company.

Ben uses some anecdotes to illustrate just how badly situations can get when programmers (or small groups of programmers) sit on their code for months on end without any outside sanity checking whatsoever. But these anecdotes are more humorous than necessary, as it’s pretty much a truism in computer science that coding off on your own in secret is a bad idea. The people who are doing it know it’s bad, and the only reason they persist is because they are ashamed. Oftentimes they’ll rationalize it by saying “I’ll just clean it up before I let others see it” — which, when combined with procrastination, can mean no one else sees it for months or even years. And if poor architecture decisions have been made, as they often are, the problem is too large for a simple clean up; a partial or full rewrite is necessary. This is not the situation you want to find yourself in.

Luckily, I can’t say I’ve ever felt ashamed of my code. And that’s not for lack of writing some truly terrible programs, either. I just value the feedback I get from others more than any personal attachment I might have to my code. In other words, I don’t take it personally. And to demonstrate that, I’m going to post a truly terrible program I wrote back in high school. My only excuse is that I was young and ignorant.

The program in question is “makeSite”, a program I wrote to create my blog-like website before the word “blog” even existed and before any real blogging software had been developed. I was writing what was effectively a blog at the time (you can see an archive of it here), but I got tired of having to hand-edit the HTML to copy over a previous entry and modify it each time I wrote a new entry. So, naturally enough, I wrote a C++ program to statically compile a bunch of text “data” files containing my own custom pseudo-HTML-like syntax into a website. I won’t defend the decision to do it this way, other than to say that I didn’t know any better. What this effectively meant was that every time I updated any part of my site, even to fix a one-character typo, my entire site had to be re-compiled by re-running the program, a task that, because my program wasn’t very efficient, was taking minutes after my site grew to be rather big. I toyed with the idea of some sort of incremental site compilation, only updating the pages corresponding to the changed data files, but I never got that working.

I think it’ll help to illustrate how bad this program truly is by individually discussing some of the more egregious parts of it.

#include “apstring.h”

For those of you who aren’t familiar with the “apstring” string library, here’s a hint: ap stands for Advanced Placement. That’s right, instead of using a standard, widely used string library (like “string.h”), I used the apstring library (by the College Board), because that’s what we were taught in class. It was just like a real string library, only it didn’t have as many features. Frankly, there’s no excuse for its existence, as tests should conform to reality and not the other way around. If you ever see it in production code, you should run like hell.

headFile.open(“pages.dat”);
output.open(“pages2.dat”);
while (headFile.get(ch))
output < < ch;
output.close();
headFile.close();

Yes, you really are looking at a character-by-character copy of a file. Never mind that there's an OS function to do this in one line (and much more efficiently, I might add). But the reason I did it this way is even worse than the way I did it, if that’s possible: I wanted a second copy of the file so I could parse through the original string-by-string, and then when I hit upon a page that was a subpage of another page, I would consult this copy to find out what its parent page was. This was to get around the problem of not being able to have two file handles open to the same file. I suppose the concept of just loading the whole file into memory and parsing through that didn’t occur to me. And notice the hard-coded file names; that’s a nice touch.

Read the rest of this entry »

Fixing an image upload bug in WordPress 2.5

Monday, May 26th, 2008

Ever since I upgraded to WordPress 2.5, I’ve been unable to successfully load images. The Upload Image page would come up just fine, I would select the image file, the progress bar would advance all the way to the finish, then I’d get kicked out to a WordPress login screen, with the image not having made it. I didn’t have the time to fix it for awhile, so I simply uploaded images to my webhost using SCP and linked to them manually, but that was a huge drain of time. So I finally sat down to fix it once and for all.

I tried all of the official WordPress fixes, to no avail. There’s something up with my shared hosting provider’s (HostMonster’s) configuration that doesn’t respond to any of the standard fixes. So I finally gave up and installed the No Flash Uploader plugin. It does exactly what it says: revert to the pre-Flash uploader days of WordPress 2.3. You may not get all of the “latest and greatest” features of the WordPress 2.5 uploader, but then again, I’d say an uploader that actually works is far superior than one that doesn’t.

So if you’re experiencing image uploading problems in WordPress 2.5, try out this plugin before you give up all hope. It’s just like uploading images in WordPress 2.3, which wasn’t bad at all. This whole Flash uploader mess — is Flash really even necessary, considering it’s not well supported on GNU/Linux? — along with the lack of password salting security hole pre-2.5 has really shaken my confidence in the WordPress developers. I’ll take basic functionality over flashy functionality any day of the week. And I wish they had followed this mantra a little more closely, as a simple Google search will reveal that I’m far from the only person having problems with WordPress 2.5.

Ending a blog is heart-wrenching

Sunday, May 25th, 2008

I’m just about ready to end my former blog, Supreme Commander Talk. It focused exclusively on the PC game Supreme Commander (don’t get bent out of shape if you have never heard of it; the game didn’t become nearly as popular as we had hoped it would). I stopped updating the blog about a year ago when I stopped playing the game. Since then, I managed to get a few other players in for short writing stints, but none of them stayed very long, and the blog has now lapsed after several months of inactivity. And given the game’s gradual loss of popularity since its release, even largely unstemmed by the release of its expansion pack, I think it’s about time to end the blog.

But ending a blog is hard. I, along with my friend Grokmoo, put a lot of effort into that blog. We were writing substantive entries in it every day. I would find myself playing multiplayer games just for the sake of having something to write about. I checked the forums and the other fansites constantly, so that even if I missed being the first to report to report on something, I would still be far from the last. It was damn fun, and it’s a real rush to grow a community around you. Oh yes, the relative “fame” was addictive. At its peak, SupComTalk was getting thrice as many daily visits as this blog currently gets. And on the aggregate, I’ve put a lot more time into this blog as well.

Ending a blog is hard, but sometimes, necessary. I don’t want to leave those loose ends hanging around perpetually, and getting overrun with spam is always a problem on a comment-enabled site that is no longer actively moderated. Of course, I’m not simply going to take the blog offline; that would be a terrible fate for something we spent so much time on (and I do despise linkrot). The simplest amenable way to end it would be to turn off commenting across the whole site, effectively rendering it static. There must be a WordPress plugin out there somewhere to mothball a blog. I’ll have to put up one final, melancholic post, allow a few final days for comments on it, and then lock it all down permanently. “This is the blog that was.”

I will miss SupComTalk a lot; don’t think this will be easy for me. I really enjoyed the experience, and I would love to do it again with some other game. Writing that blog was the closest taste of Internet fame I’ve ever had (admittedly, just a taste; not even close to a mouthful). And there was a lesson there that I quickly learned, yet have still failed to follow: single-topic blogs that focus on specific subjects are, on the average, far more successful than personal blogs that focus on whatever smattering of topics the writer happens to be interested in. Some day yet I might finally apply that knowledge to this blog — or perhaps create a new one. I’m still thinking about it. But as I draw close to finally pulling the plug on SupComTalk, it weighs heavier and heavier still on my mind.

Site note: new anti-spam measures

Tuesday, May 13th, 2008

As the more astute readers may have noticed, I’ve increasingly been having spam problems on this site. More and more garbage comments and pingbacks were getting through my spam filter, Spam Karma. Unfortunately, the sole developer of that WordPress plugin stopped working on it more than a year ago, while the spammers haven’t stopped improving their techniques. So I’m switching over to Akismet, WordPress’s own anti-spam plugin, which is still actively supported. I’ll report on how well it’s doing after I’ve seen it in use for a couple weeks, but after one day of usage, I can at least guarantee that it doesn’t totally suck, as it’s stopped dozens of spam comments without letting a single one through.

Those of you who aren’t bloggers, consider yourselves lucky that you don’t have to deal with the messy issue of blog spam. I’ve found it to be a lot worse than tackling email spam. For starters, I get a lot more of it, and I also have to deal with it, as any spam that gets through makes your site look really trashy and could potentially damage your search engine rankings (Google punishes sites that link to spammy havens of the Internet). When you get a spam email, you can just ignore it and nothing bad happens; when you get a spam comment on your blog, you have to delete it, and that’s a fair bit more effort.

In my time off from fighting against spam, I amuse myself by thinking of all sorts of creative punishments for blog spammers. For instance, I’m a fan of Medieval-style hanging, drawing, and quartering, but that doesn’t quite satisfy me. I’d prefer hanged, drawn, and fractally quartered. Cut into four pieces, then cut each remaining piece into four pieces, ad infinitum …

That’s an appropriate punishment for spammers, and it satisfies my fascination with mathematics to boot.

Passionate writing is excellent writing

Monday, April 14th, 2008

One thing I’ve come to learn over the years that I’ve been writing is that the more passionate you are about a subject, the easier it is to write about it. Ditto for being more knowledgeable about a subject (but that is perhaps trivial). My favorite posts are those about the subjects I am most passionate about. And not only is the resultant work better, but it takes less time to write as well. I’ve spent hours laboring over works that didn’t turn out very satisfyingly, whereas for other works I sat down, wrote at a break-neck speed, and within ten minutes had something I was really proud of.

For instance, look at the post I wrote the recent death of my great-aunt Muriel. It was an obituary of sorts, covering salient points of her life, explaining why the unknowledgeable reader should care that she died. It also expressed my innermost feelings on expected deaths. I think it came out really well, and I can tell it resonated with others by the comments that were left. Yet it was incredibly easy to write, probably taking a total of less than half an hour (and it was written within a few hours of hearing of her death). I didn’t even have the time to go research how old she was, leaving a perhaps too gruff proclamation to set the tone at the beginning of the post, but I shan’t go back and edit it. That post is from-the-heart, brutally honest, and essentially unedited, yet since it was something I felt passionate about, it just flowed from my mind, through my fingers and the keyboard, and onto the screen. I swear the number of typos I was making was lower than average, even though the typing speed was higher.

My first column for University of Maryland’s student newspaper The Diamondback was on a topic I am very passionate about, evolution. I’m very happy with the way that one turned out. It did take awhile to write, but only because I was completely unfamiliar with writing for the newspaper business. My later columns on similar subject matters were dashed off very quickly, yet with good results, because I am passionate about and intimately familiar with the material.

Now compare that to my column on bike theft on campus, which, frankly, was a waste of newspaper space. Being a columnist for the Diamondback was kind of limiting. We had to write about topics relevant to students and the school, and even though my column was only published twice a month, I couldn’t always find anything interesting to me to write about. Hence the column about bike theft. I’ll be honest: I don’t give a damn about bike theft. I don’t own a bike, it’s a boring topic, and nobody really cares. Yet it took me longer to write that column (over three hours, I think) than any other one I ever wrote. Why? Because I was reaching so hard just to find something to say about it. The thrust of the column boils down to one sentence: “Bike theft is bad and security measures on campus should be better,” gaining nothing in the expansion to a whopping full page of newspaper column. Yet I couldn’t come close to distilling my blog post about my great-aunt down into a smaller number of words without vastly affecting its quality.

Read the rest of this entry »

WordPress finally discovers salted passwords

Saturday, March 29th, 2008

WordPress 2.5 is out today and it looks mighty impressive. I’m going to wait a few days for reports of compatibility with the plugins I’m using before I upgrade, but after that, expect to see WordPress 2.5 on this blog soon.

Looking through the changes list, I did notice one odd thing. WordPress 2.5 finally adds salt to stored password hashes. It’s nearly inconceivable to me that WordPress went so long without salted passwords — it’s an incredibly important security technique that essentially has zero implementation cost. When I was helping to design the software infrastructure that powers Veropedia, I made sure that password hash salting was in our alpha. And yet it takes the fine folks over at WordPress until version 2.5 to implement it? Did they not realize how important it is to security?

Here’s why password salting is so important. The naive algorithm for storing login passwords in a database is to store them as plaintext. User tries to login, the inputted password is matched against the password field under their username in the database, and if it matches, the login is successful. The reason this is terrible security practice is because if the database is compromised (which is surprisingly easy to accomplish even remotely using SQL injection) the entire list of passwords can be revealed, compromising the entire site and everyone who is registered to use it.

So the next step in the evolution of login security (and this happened decades ago) was to use a one-way function called a hash function to store the password in the database. I won’t go into the details of how a hash function works, but the key point to know is that it is one way: given an input, you can quickly calculate the output, but given the output, you cannot calculate what the input was. So, now password hashes are stored in the database instead of the raw password, and when a user goes to log in, their input is hashed and compared against the value in the database. This is what WordPress used up until version 2.5.

There’s just one major flaw with this seemingly secure system. There are only a few widely-used hash algorithms, and they all necessarily run quickly on small inputs, so it’s trivial to pre-compute a huge list of potential passwords and their associated hash values. This is called a rainbow table, and larger rainbow tables have trillions of entries in them, pretty much guaranteeing a successful attack against less secure passwords (short ones, ones that don’t use numbers and punctuation, etc.). So we’re pretty much back to square one: database is compromised, the hashed passwords are compared with the rainbow table nearly instantaneously, and lots of accounts can be compromised.

Read the rest of this entry »

Plan thrice, blog once

Friday, March 7th, 2008

I’m changing the category organization on this blog at the moment because the “Tech” category was getting too large and was dwarfing the other categories in size. It really sucks going through over a hundred old posts and re-categorizing them. Don’t put yourself into a situation where you have to do it. Plan thrice, blog once.

Looking through all of these old posts, I can’t help but critique some. Some were good, others okay, and some truly bad. Here’s a really bad one: I blogged about how no blog posts would be posted that day.

I guess I lied?