Archive for the 'Net' Category

And the spammers have won (comments are now disabled)

Friday, December 13th, 2013

The problem with spammers on the web that I talked about before has continued getting worse and worse, to the point where hundreds of spam comments (and no identifiably non-spam comments) have made it through onto this blog. Akismet wasn’t doing the job, and I don’t maintain this blog often enough to keep all of the spam out. My only fix is to disable all comments. It sucks, because the two defining features that characterize a blog are (a) being personal and opinionated and (b) allowing multi-way communication between the author and the readers, and the readers with each other. I’m no longer doing half of that, so in some very real senses this is no longer a blog (though feel free to still email me if you want to chat). I’m really sad about it. But the spammers have won. It’s no longer feasible to try to maintain your own blog out there. Just host it with WordPress or similar and let someone else handle the spam problem, because I can’t stay on top of it any longer.

Reminiscing about the naïve, spam-free days of the web

Tuesday, July 21st, 2009

Remember a long time ago when the web was free of spam? I’m not talking about email, which has had spam problems for awhile, I’m talking about the web. Nowadays, the web is festering with link-crawling spambots. Anyone with a blog, Twitter account, or heck, even a webpage with a simple submit form with some text fields on it, knows this. There’s not much that can be done about it besides spam-detection heuristic algorithms and CAPTCHAs.

Well, I just recently found some code that I wrote way back in 2002 that displays a blissfully unaware naïvité of what was to come. That code was part of my website Fyre’s Domain, which I have since put an archived copy of online. I had just been learning Perl CGI and I wanted to write a simple guestbook/comments form that readers could use to give me feedback without having to use a mailto: link. This was in the era before blogging software was commonplace — what I was running was a home-brew blog, but before the word “blog” was even invented. I basically copied the format from one of the first chat rooms I ever used, Paddynet, way back in 1995 or so. The “chat room” consisted of an HTML form that would dump unvalidated input (including HTML tags) into a chat buffer displayed on the page that would display the last 30 or so messages.

Paddynet was around long before spambots, but my site was started right when they began appearing in the wild, and the code proceeded to run for another 7 years until I just shut it off.

You can probably guess what happened.

The only reason I even re-discovered this code is because I happened to notice it was getting an unusual number of hits in my web analytics software. And those hits were anything but benign. My poor naïve Perl CGI comments submission form has accumulated 26 MB worth of text over the years, all of it spam. And since I figure it may be interesting to someone to see exactly what seven years of web spam looks like, you can download it for yourself (a text file thankfully compressed down to just 1.8 MB). If anyone finds any interesting trends in popular spam topics over the years in there, do let me know.

So those are the dangers of trusting user input on the web these days. Revel in the blissful simplicity of the following code, which was all it took to implement a comment submission system back in the day. Nowadays you couldn’t get away with anything even close to it. As my data proves, you’ll be eaten alive by spambots.


use CGI qw(:standard);

print header;
print start_html('Leave Comments on Fyre'),
	h1('Leave Comments on Fyre'),
	"<i>Note, all fields are optional, but empty comments will be ignored.</i><br>", 
	"Name: ", textfield(-name=>'name',-default=>''),
	"E-mail: ", textfield(-name=>'e-mail',-default=>''),
	"Your Comments: <br>", textarea(-name=>'Comments',-rows=>10,-columns=>50,-default=>''),
	submit('Submit'), reset,

if (param() && param('name') ne '' && param('Comments') ne '') {
	$date = `date '+%H:%M %m/%d/%Y'`;

	print '<i>Your comment has been posted.</i><hr><br>';
	@foo = "\n\n" . '<br><b>' . param('name') . '</b> ' . "\n" .
	'<u>' . param('e-mail') . '</u> ' . "\n" . '<i>' . $date . '</i>' . 
	"\n" . '<table><tr><td width = "100%">' . param('Comments') . 
	push @foo, `cat mk.txt`;
	open CFILE, ">mk.txt" or die "Failed to open comments file!";
	print CFILE @foo;
	close CFILE;

@foo = `cat mk.txt`; print @foo;

print 'This program is open source, and released under the GPL by Ben McIlwain, 2002.  See 
the source <a href = "mk_script.txt">here</a>.';
print end_html;

Why I use and you should too

Sunday, March 22nd, 2009

Those of you following me on Twitter may have noticed that all of my tweets come from Identica. I started off with Twitter but I quickly switched over to Identica as soon as I learned about it. Identica, if you haven’t heard of it before, uses the same micro-blogging concept as Twitter (and in fact is compatible with it), but has several improvements. I recommend Identica, and if you aren’t using it yet, check out these reasons as to why you should.

There are several practical reasons you should use Identica:

  • All of your data is exportable on Identica, including your entire corpus of tweets. Twitter does not provide this functionality. Should you want to migrate away from Twitter down the road (for any variety of as-of-yet-unforseen reasons), you are unable to do so, but you are able to migrate away from Identica at any point easily. And since Identica uses the Free Software Laconica software, you can even install Laconica on your own web host and import all of your data there, where you can have complete control over it.
  • Identica has a powerful groups feature that allows people to collectively subscribe and see all tweets sent to a group (this is what the exclamation syntax you may have seen in tweets is about). Groups are a powerful way to build communities and have multi-party discussions, but Twitter does not have them.
  • You don’t have to quit Twitter. My Identica account is linked to my Twitter account, so every message that I send to Identica automatically appears on Twitter. Posting to Identica+Twitter takes the same amount of effort as posting to Twitter alone, except it is seen by more people.
  • Identica lets you see things from other people’s perspective. I’ll use me as an example. You can see my entire tweet stream, which includes messages from all users and groups I’m following. This should give you a great idea of the kinds of things I’m interested in. And you can see all of the replies to me, which makes it a lot easier to track and understand conversations. Note that all of this is public information and is accessible on Twitter through trickier ways (in the first case, looking at the list of a person’s followers and combining all their tweets in chronological order; in the second case, by searching for “@username” on the search subdomain), so you aren’t giving up any of your privacy. Identica simply makes these features a lot easier to use.
  • Some people you may end up finding and wanting to talk with don’t use Twitter at all; they’re only on Identica. Get on Identica and link it to Twitter and you can talk to everyone on both services. Just use Twitter, however, and you’re left out in the cold with regards to anyone who only uses Identica.

And there is one important ethical reason you should use Identica:

  • Identica is Free (as in freedom, not merely cost). Because it follows the Free software ethos, it respects your rights and maximizes your freedom to control your data as you see fit, including the ability to move all of your data elsewhere if necessary. Twitter does not respect these freedoms.

Australia blocks my page from their Internet

Wednesday, March 18th, 2009

A couple years ago, when I was more active on Wikipedia than I am now, I was trying to prove a point by compiling a list of all of the risque images on Wikipedia (link obviously NSFW). I don’t quite remember what that point is anymore, but the list remains. It has even survived a deletion attempt or two. I stopped maintaining it a long time ago, but for whatever reason, others picked it up and continued adding more pictures in my stead. I haven’t thought of it in awhile.

So imagine my surprise when I learn that that silly page has made Australia’s secret national Internet censorship blacklist. I don’t understand the justification here — all of these images are hosted on Wikimedia servers, after all — but I have to laugh when I imagine some Australian apparatchik opening a report on this page, viewing it, making the determination that it’s not safe for Australian eyes, and adding it to the list without further thought, mate.

Australians, please take back control of your country.

A Python script to auto-follow all Twitter followers

Tuesday, March 10th, 2009

In my recent fiddling around with Twitter I came across the Twitter API, which is surprisingly feature-complete. Since programming is one of my hobbies (as well as my occupation), I inevitably started fooling around with it and have already come up with something useful. I’m posting it here, so if you need to do the same thing that I am, you won’t have to reinvent the wheel.

One common thing that people do on Twitter is they follow everyone that follows them. This is good for social networking (or just bald self-promotion), as inbound links to your Twitter page show in the followers list of everyone that you’re following. You’d think Twitter itself would have a way to do this, but alas, it does not. So what I wanted to do is use a program to automatically follow everyone following me instead of having to manually follow each person.

Other sites that interface with Twitter will do it for you (such as TweetLater), but I’m not interested in signing up for another service, and I’m especially not interested in giving out my Twitter login credentials to anyone else. So I needed software that ran locally. A Google search turned up an auto-follow script written in Perl, but the download link requires registration with yet another site. I didn’t want to do that so I decided to program it for myself, which ended up being surprisingly simple.

My Auto-Follow script is written in Python. I decided to use Python because of the excellent Python Twitter library. It provides an all-Python interface to the Twitter API. You’ll need to download and install Python-Twitter (and its dependency, python-simplejson, if you don’t have it already; sudo apt-get install python-simplejson does the trick on Ubuntu GNU/Linux). Just follow the instructions on the Python-Twitter page; it’s really simple.

Now, create a new Python script named and copy the following code into it:

# -*- coding: utf-8 -*-
#(c) 2009 Ben McIlwain, released under the terms of the GNU GPL v3.
import twitter
from sets import Set

username = 'your_username'
password = 'your_password'
api = twitter.Api(username=username, password=password)

following = api.GetFriends()
friendNames = Set()
for friend in following:

followers = api.GetFollowers()
for follower in followers:
    if (not follower.screen_name in friendNames):

Yes, it really is that simple. I’d comment it, but what’s the point? I can summarize its operation in one sentence: It gets all of your friends and all of your followers, and then finds every follower that isn’t a friend and makes them a friend. Just make sure to edit the script to give it your actual username and password so that it can sign in.

Run the script and you will now be following all of your followers. Pretty simple, right? But you probably don’t want to have to keep running this program manually. Also, I’ve heard rumors that the Twitter API limits you to following 70 users per hour (as an anti-spam measure, I’m guessing), so if you have more than 70 followers you’re not following, you won’t be able to do it all at once. Luckily, there’s a solution for both problems: add the script as an hourly cronjob. This will keep who you follow synced with your followers over time, and if you have a large deficit in who you follow at the start (lucky bastard), it’ll slowly chip away at it each hour until they do get in sync. In Ubuntu GNU/Linux, adding the following line to a text file in /etc/cron.d/ (as root) should do it:

0 * * * * username python /path/to/ >/dev/null 2>&1

This will run the auto_follow script at the top of each hour. You’ll need to set the username to the user account you want the job to run under — your own user account is fine — and set the path to wherever you saved the auto_follow script. Depending on your GNU/Linux distribution and which cron scheduler you have installed, you may not need the username field, and this line might go in a different file (such as /etc/crontab). Refer to your distro’s documentation for more information.

So that’s it. That’s all it takes to automatically auto-follow everyone who’s following you — a dozen or so lines of Python, one crontab entry, and one excellent library and API. Enjoy.

I caught the Twitter bug

Tuesday, February 24th, 2009

Sigh. A lot of other people at work were using Twitter, so now I am too. If I join anything else, I’ll need to think of a good way to organize all of my web presences. I guess this blog can be the mothership, and contain links to everything else.

So far I seem to be using Twitter as a dumping ground for my Google Talk status messages, so they are no longer lost to the mists of the intertubes when I switch to a new one. I don’t ever foresee myself updating it on the go from a mobile phone — I just don’t have that much of a desire to remain connected. Being off the grid can be good sometimes.

Firefox continues gaining market share, software flaws

Tuesday, February 3rd, 2009

Excellent news! My favorite web browser, Mozilla Firefox, has gained market share yet again and now commands 21.53% of the market. That’s a far cry from several years ago when Firefox was just coming out and Internet Explorer was by far the dominant browser. I still remember all of those sites that only worked in Internet Explorer, and because alternative browsers weren’t very popular, companies got away with it. Now that Internet Explorer “only” has 67.55% of the market, no one dares make a site that requires it, thus alienating a whole third of potential customers. I can’t even remember the last time I saw an IE-only site.

Unfortunately, while Firefox’s market share is gaining, the software itself is gaining more and more problems. Firefox crashing has become a daily occurrence for me. I remember when it used to stay alive for months — barely. At least it now saves the list of open tabs and allows you to resume them upon restarting, but you still lose lots of logged in sessions and it’s just a big hassle. And while it is true that most Firefox crashes can be traced back to Adobe’s Flash plugin, there’s no excuse for Firefox allowing a bug in a plugin to crash the whole application. Google Chrome found a fix for this by running each tab as a separate process. Firefox needs to do the same, or else it won’t keep gaining market share for much longer. As much as I hate to admit it, Firefox has some pretty significant flaws.

How browser security exploits hinder exploration of the web

Monday, December 22nd, 2008

It’s important to be able to feel safe while browsing the web, both in terms of what your software protects you against and what your own “web street smarts” protect you against. Users who don’t feel safe will restrict themselves to big sites by recognizable companies and other sites that they already visit regularly — still a useful use of the web, sure, but one of the quirky charms of the web is all of that weird stuff that can exist only in this medium, and if you aren’t browsing them, you’re missing out. An even worse category of user is one who feels safe but isn’t, thus exposing themselves to viruses, malware, and even identity theft. Unfortunately, it appears that everyone who uses Internet Explorer is in this category.

In the latest in a long line of Microsoft failings, another Internet Explorer bug has been discovered that pretty much allows arbitrary malicious control over your computer simply by viewing an infecting website. This critical vulnerability was patched recently, but keep in mind that millions of computer users patch their software on an irregular basis, and further millions never patch at all. The number of computer users vulnerable to this one exploit thus remains in the tens of millions, at least. Using Internet Explorer simply isn’t safe, and the majority of people know this. The worse knock-on effect of this is that it causes people to adjust their browsing accordingly, treating the web as a shady inner city neighborhood to be avoided rather than a beautiful vista that demands exploration.

Switching to Mozilla Firefox is a no-brainer. But even with Firefox, as long as you’re still running Windows, you’re still quite vulnerable. It’s possible for even the experienced web user to get caught by what appears to be a trial download of a legitimate piece of software that is actually a virus. This is one of the many reasons why I choose GNU/Linux as my operating system. I browse the web with impunity, journeying where most others dare not, because I have taken the necessary steps to truly protect myself. And the view from way up here is amazing.

Fixing ordering bias of U.S. presidential election candidates on Wikipedia

Monday, November 3rd, 2008

Today, upon getting home from work, one of the first things I did was check the Main Page of the English Wikipedia. It always has interesting content on there, and today was no exception. For the first time ever, two articles were featured on the front page: those of John McCain and Barack Obama. Except there was one little niggling problem: John McCain was listed first. Granted, his last name does come first alphabetically … but still. This is the Internet. We don’t have the limitations of printed paper ballots; there’s no reason the candidates have to be displayed in a static order. And I happen to be an administrator on the English Wikipedia, so I can edit any page on the site, including the main page and the site-wide JavaScript. So I fixed the ordering, presumably much to the delight of all of the people who had been complaining about bias on the talk page.

I took some JavaScript that was previously used in the Wikimedia Foundation Board elections, where ordering of the several dozen candidates had proved to be a huge bias in previous elections, and added it to the English Wikipedia. Then I modified the main page slightly to use the JavaScript and, boom, the candidates now appear in a random order upon each page load. I figure if this solution was good enough for WMF Board elections then it ought to be good enough for the United States presidential election, right?

So if you go to the main page of Wikipedia now, you should see either Barack Obama or John McCain on top, with a 50% probability of each (if you’re not seeing this behavior, flush your browser’s cache). Considering how many people view Wikipedia each day, I like to think this will make some kind of difference.

How to prevent Firefox from lagging badly when dragging selected text

Tuesday, October 28th, 2008

This past week I upgraded my system from Ubuntu 8.04 to Ubuntu 8.10. The upgrade was pretty smooth, with nothing much to report except that my system now boots without requiring the all_generic_ide kernel parameter, which is nice. One problem that I immediately started seeing, however, was that my system would freeze up terribly whenever I selected more than a few words in Mozilla Firefox and tried dragging them anywhere. Depending on how large the block of text was, my entire system could freeze up for minutes at a time as it spent several seconds drawing each frame of the text block moving.

Well, I’d had enough of it, and I went looking for a solution. Firefox didn’t always render the entire contents of the selection being dragged-and-dropped; it used to just display a little icon next to the cursor. Here’s how to restore that functionality and remove the lag from the fancy but ultimately unnecessary fully rendered dragging:

  1. Type about:config into Firefox’s location bar and hit Return.
  2. In the filter text edit box at the top of the window, type nglayout.
  3. Double-click on the nglayout.enable_drag_images row to change its value to false.
  4. That’s it! Firefox will no longer try to render the contents of the selection to the screen as you drag words around. For older systems or systems with poor graphical support (like mine, apparently), this is pretty much a mandatory change. Enjoy your new, faster Firefox!