Using LINQ in C# to easily get a list of most often used words

May 20th, 2013 15:26

A pretty common programming interview question is to parse an input sentence and return the list of unique words used in that sentence. A further elaboration on that problem, the one that this post will be addressing, is to additionally calculate the number of occurrences of each word, and then return the top K words for some input value K. I’ll be demonstrating a simple solution to that problem in C#, both because I’ve been using it a lot recently and also because the choice of C# gives us access to LINQ, which is a powerful C# language feature that allows queries on collections using a SQL-like syntax. The top K problem is incredibly easy to solve in SQL, and boils down to SELECT TOP @K ..... FROM Words ORDER BY Word.Name DESC. The C# solution is similarly easy.

First, I’ll clarify the assumptions that I’m using (and that would be wise for an interviewee to address if none of these are made explicit):

  • I’m writing my solution to be case-insensitive. “case”, “Case”, and “CASE” will all thus count as the same word.
  • I’m not dealing with ties. If you ask for the top 3 words by occurrence and there are 5 words that are all used the most in equal numbers, then you’re still only going to get three words of those five, selected in no particular manner.
  • I’m going to use the C#’s language spec definition of a non-word character in regular expressions to separate the input sentence into words. The naive solution would be to split the string on only spaces, but then you’re not handling punctuation correctly.

And for the solution:

Dictionary<string, int> GetTopKWords(string input, int k)
{
	string[] words = Regex.Split(input, @"\W");
	var occurrences = new Dictionary<string, int>();
	
	foreach (var word in words)
	{
		string lowerWord = word.ToLowerInvariant();
		if (!occurrences.ContainsKey(lowerWord))
			occurrences.Add(lowerWord, 1);
		else
			occurrences[lowerWord]++;
	}
	return (from wp in occurrences.OrderByDescending(kvp => kvp.Value) select wp).Take(k).ToDictionary (kw => kw.Key, kw => kw.Value);
}

The vast majority of this code is responsible simply for finding the list of unique words the number of occurrences of each one. Once that is known, finding the top K words is a single line of code thanks to the power of LINQ. All we’re doing is ordering the words by frequency in descending order and taking the top K words. Without LINQ, there would be significantly more book-keeping code required to do this, which would make a good exercise for the reader (e.g. solve this problem in Java). The first roadblock you’ll probably run into is that you can’t simply flip the keys and values of the array, because the frequency counts aren’t unique. The best I’ve come up with is to construct a list of ordered tuples out of the dictionary of words, order it on the occurrence count part of each tuple, and then extract the first K elements from the resulting ordered list and return it. Let me know in the comments if you have a better solution.

Oh, and here’s an example input/output for the program, handling the display of output using LINQPad:

var input = "the quick brown fox is brown and jumps over the brown log over the long fire and quickly jumps to a brown fire fox";
GetTopKWords(input, 10);

outputs:

Top K Words sample run

Fixing an error with being unable to add user fields in Drupal 7

April 4th, 2012 10:59

Drupal 7 has a nice built-in Fields functionality which can be used to add fields to any entity. As applied to users, this replaces the previous Profile module which was used in Drupal 6 to add fields to users. However, right after creating a new Drupal 7 site, I couldn’t figure out how to add fields to users.

I followed these instructions

Just go to:

Configuration > People > Account Settings

Then click on the Manage Fields tab, and then just manage the fields just like you would for a Content Type.

But there wasn’t a “Manage Fields” tab on that page. Going to the URL directly, admin/config/people/accounts/fields, was redirecting me back to the Account Settings page. After banging my head against the wall for ten minutes I finally realized that the reason I couldn’t see this tab was because the “Field UI” module wasn’t enabled. Go to the List Modules page, enable that module (and its dependencies), and now you should be able to add fields to users.

The many ways in which Gravity’s Rainbow directly inspired Neon Genesis Evangelion

January 19th, 2012 22:46

I recently read the well-known post-modernist novel Gravity’s Rainbow (1973) by Thomas Pynchon, and I noticed a number of surprising similarities between it and the well-known Japanese anime series Neon Genesis Evangelion (1995). In fact, there are so many similarities between the two, both thematically, stylistically, and plot-wise, that I am forced to conclude that Hideaki Anno, the writer and director of Evangelion, must have read Gravity’s Rainbow and drew upon it specifically for inspiration in creating his series. Unfortunately, I haven’t found any discussion of the similarities between these two works, hence the need for this post.

I’ll assume a familiarity with Evangelion for the remainder of this post (which allows me to focus on summarizing and explaining Gravity’s Rainbow). There are some spoilers for Gravity’s Rainbow below if you haven’t read it yet, but the novel is so meandering and expansive that it’s not possible to ruin it.

I’ll examine the thematic similarities first. Gravity’s Rainbow runs from the beginning of World War II, through V-E Day, and through the occupation of Germany by the Allied forces. The first part of the novel, which takes place in England during the German bombing and rocket campaign, takes place under a heavy siege mentality. V-2 rockets are falling often, at random, and killing lots of civilians. There are many scenes that take place deep within bunkers, or have military personnel travelling to the scene of the latest rocket strike to investigate the effects. This whole section of the novel feels very similar to the overall mood in Tokyo-3 as Japan is besieged by one attacking Angel after another, right down to the missions being ordered from within the safety of a bunker.

Gravity’s Rainbow is suffused throughout with the paranormal, the occult, the bizarre, and many different references to psychology (especially that of Sigmund Freud). Pynchon is every bit as obsessed with psychology as Hideaki Anno, to the point where if you couldn’t handle the original last two episodes of Evangelion, you probably won’t enjoy the similar parts of Gravity’s Rainbow either, as there is an equal amount of psycho-analytical musing in it. Both works examine military hierarchies and point out some of the inherent absurdities in them. Gravity’s Rainbow especially focuses on jargon-heavy, acronym-laced, secret military, espionage, and industrial research organizations, and the interplay and conflict between them — exactly like Evangelion.

Gravity’s Rainbow is also laced throughout with sexuality and sexual deviance, which is another theme that Evangelion explores quite thoroughly. A pervading sense of paranoia is present throughout the text. It has a large amount of technological detail in it, verging towards technobabble on many occasions, same as Evangelion. It even has one particularly memorable scene in which a boy is plugged into a harness made out of Imipolex G, an erectile plastic polymer (you can’t make this stuff up) that interfaces directly with the boy’s neural network. This harness is then put inside a V-2 rocket and launched with the boy as the unwitting cargo (not as pilot). Shades of Shinji being plugged into an Evangelion and then losing control of it, anyone?

But the most convincing case of Evangelion’s inspiration from Gravity’s Rainbow can be made by looking directly at some examples from the text of the novel. I’ll present three passages that were so startlingly similar to Evangelion that I set the book aside in amazement long enough to take notes, wondering how nobody had ever caught this before (or, at least, if they did, why they didn’t post their findings online). All page numbers I’ll be using are from the original 760-page edition of Gravity’s Rainbow.

First, we’ll start with a passage from page 151, in which a Royal Air Force bomber squadron is attacking the German town of Lübeck.

It’s a dangerous game Cherrycoke’s playing here. Often he thinks the sheer
volume of information pouring in through his fingers will saturate, burn him out
. . . she seems determined to overwhelm him with her history and its pain, and
the edge of it, always fresh from the stone, cutting at his hopes, at all their hopes.
He does respect her: he knows that very little of this is female theatricals, really.
She has turned her face, more than once, to the Outer Radiance and simply seen
nothing there. And so each time has taken a little more of the Zero into herself.
It comes down to courage, at worst an amount of self-deluding that’s vanishingly
small: he has to admire it, even if he can’t accept her glassy wastes, her appeals
to a day not of wrath but of final indifference. . . . Any more than she can accept
the truth he knows about himself. He does receive emanations, impressions . . .
the cry inside the stone . . . excremental kisses stitched unseen across the yoke of
an old shirt. . . a betrayal, an informer whose guilt will sicken one day to throat
cancer, chiming like daylight through the fourchettes and quirks of a tattered
Italian glove . . . Basher St. Blaise’s angel, miles beyond designating, rising over
Lübeck that Palm Sunday with the poison-green domes underneath its feet, an
obsessive crossflow of red tiles rushing up and down a thousand peaked roofs
as the bombers banked and dived, the Baltic already lost in a pall of incendiary
smoke behind, here was the Angel: ice crystals swept hissing away from the back
edges of wings perilously deep, opening as they were moved into new white
abyss. . . . For half a minute radio silence broke apart. The traffic being:

St. Biaise: Freakshow Two, did you see that, over.

Wingman: This is Freakshow Two—affirmative.

St. Biaise: Good.

No one else on the mission seemed to’ve had radio communication. After
the raid, St. Biaise checked over the equipment of those who got back to base
and found nothing wrong: all the crystals on frequency, the power supplies
rippleless as could be expected—but others remembered how, for the few
moments the visitation lasted, even static vanished from the earphones. Some
may have heard a high singing, like wind among masts, shrouds, bedspring or
dish antennas of winter fleets down in the dockyards . . . but only Basher and
his wingman saw it, droning across in front of the fiery leagues of face, the eyes,
which went towering for miles, shifting to follow their flight, the irises red as
embers fairing through yellow to white, as they jettisoned all their bombs in no
particular pattern, the fussy Norden device, sweat drops in the air all around its
rolling eyepiece, bewildered at their unannounced need to climb, to give up a
strike at earth for a strike at heaven . . . .

Group Captain St. Biaise did not include an account of this angel in his official
debriefing, the W.A.A.E officer who interrogated him being known around the
base as the worst sort of literal-minded dragon (she had reported Blowitt to
psychiatric for his rainbowed Valkyrie over Peenemünde, and Creepham for
the bright blue gremlins scattering like spiders off of his Typhoon’s wings
and falling gently to the woods of The Hague in little parachutes of the same
color). But damn it, this was not a cloud. Unofficially, in the fortnight between
the fire-raising at Lübeck and Hitler‘s order for “terror attacks of a retaliatory
nature”—meaning the V-weapons—word of the Angel got around. Although
the Group Captain seemed reluctant, Ronald Cher-rycoke was allowed to probe
certain objects along on the flight. Thus the Angel was revealed.

The similarities to Evangelion here are obvious. The bomber squadron runs into an apparition in the sky, while all radio contact goes dead. This apparition is even known as the Lübeck Angel.

Next, we have a small fanciful vignette from page 674:

Onward to rescue the Radiant Hour, which has been abstracted from the day’s
24 by colleagues of the Father, for sinister reasons of their own. Travel here gets
complicated—a system of buildings that move, by right angles, along the grooves
of the Raketen-Stadt’s street-grid. You can also raise or lower the building itself,
a dozen floors per second, to desired heights or levels underground, like a
submarine skipper with his periscope—although certain paths aren’t available to
you. They are available to others, but not to you. Chess. Your objective is not the
King—there is no King—but momentary targets such as the Radiant Hour.

Notice that the city of Raketen-Stadt as being described here is pretty much identical to Tokyo-3, in which the skyscrapers are above-ground during the day, but are lowered underground into the Geo-Dome at night or when the city is under attack from Angels.

And finally, from page 753 near the very end of the novel:

The countdown as we know it, 10-9-8-u.s.w., was invented by Fritz Lang
in 1929 for the Ufa film Die Frau im Mond. He put it into the launch scene to
heighten the suspense. “It is another of my damned ‘touches,’ “ Fritz Lang said.

“At the Creation,” explains Kabbalist spokesman Steve Edelman, “God
sent out a pulse of energy into the void. It presently branched and sorted into
ten distinct spheres or aspects, corresponding to the numbers 1-10. These are
known as the Sephiroth. To return to God, the soul must negotiate each of the
Sephiroth, from ten back to one. Armed with magic and faith, Kabbalists have
set out to conquer the Sephiroth. Many Kabbalist secrets have to do with making
the trip successfully.

“Now the Sephiroth fall into a pattern, which is called the Tree of Life. It
is also the body of God. Drawn among the ten spheres are 22 paths. Each path
corresponds to a letter of the Hebrew alphabet, and also to one of the cards called
‘Major Arcana’ in the Tarot. So although the Rocket countdown appears to be
serial, it actually conceals the Tree of Life, which must be apprehended all at
once, together, in parallel.

“Some Sephiroth are active or masculine, others passive or feminine. But the
Tree itself is a unity, rooted exactly at the Bodenplatte. It is the axis of a particular
Earth, a new dispensation, brought into being by the Great Firing.”

“But but with a new axis, a newly spinning Earth,” it occurs to the visitor,
“what happens to astrology?”

“The signs change, idiot,” snaps Edelman, reaching for his family-size jar of
Thorazine. He has become such a habitual user of this tran-quilizing drug that
his complexion has deepened to an alarming slate-purple. It makes him an oddity
on the street here, where everybody else walks around suntanned, and red-eyed
from one irritant or another. Edelman’s children, mischievous little devils, have
lately taken to slipping wafer capacitors from junked transistor radios into Pop’s
Thorazine jar. To his inattentive eye there was hardly any difference: so, for a
while, Edelman thought he must be developing a tolerance, and that the Abyss
had crept intolerably close, only an accident away—a siren in the street, a jet
plane rumbling in a holding pattern— but luckily his wife discovered the prank
in time, and now, before he swallows, he is careful to scrutinize each Thorazine
for leads, mu’s, numbering.

“Here—” hefting a fat Xeroxed sheaf, “the Ephemeris. Based on the new
rotation.”

“You mean someone’s actually found the Bodenplatte? The Pole?”

“The delta-t itself. It wasn’t made public, naturally. The ‘Kaisers-bart
Expedition’ found it.”

A pseudonym, evidently. Everyone knows the Kaiser has no beard.

This illustrates many thematic similarities with Evangelion, including references to the Kabbala, the Tree of Life, mythological angels, the occult, cataclysmic events, and even a search for The Pole (ahem, Second Impact). And note that it takes place in the context of a long-winded, jargon-heavy discussion between military figures. If you animated this passage it would fit right into an episode of Evangelion.

I believe that the similarities between Gravity’s Rainbow and Evangelion (which came out two decades later) have been established beyond a reasonable doubt. I wouldn’t go so far as to use the word “steal”, but in my mind, Evangelion directly owes a lot of its feel and setting to Pynchon’s work. As a consequence of this, if you’re an Evangelion fan, you owe it to yourself to read Gravity’s Rainbow. Not only is it a good novel in its own right, but by reading and understanding some of the inspiration behind Evangelion, you’ll get a better understanding of Evangelion itself.

Long time no blog

February 20th, 2011 18:39

It’s been a long time since I’ve posted anything here — over a year, anyway, which might as well be a decade in Interwebs time. The funny thing is that this blog gets more views now than it ever did before. That’s right, the Google searches leading to the backlog of old content have continued to drive traffic in ever increasing numbers even though nothing new has been posted in awhile. As a result of that, the AdSense ads on this site continue to support the cost of server hosting; hence why the site is still around even though I haven’t written anything new in awhile. Now I’m not making any money off it, but it’s always nice to have access to a co-hosted GNU/Linux server running somewhere (especially when said access is free).

One wonders what traffic on this site might be like now if I had kept updating it regularly, but whatever. I used that time for other things. Like reading!

I’ve been working my way through Modern Library’s Top 100 English Novels of the 20th Century. I’ve also been taking many a detour through other classic American novels that didn’t make the board’s list (To Kill A Mockingbird anyone?) and that I never got a chance to read in high school or college.

Most recently I read John Steinbeck’s Cannery Row. I’m going to have to go back and read a bunch more of his books (the only other one I think I’ve ever read was The Grapes of Wrath, back in high school). I really enjoy him as an author, especially the little textural vignettes that have nothing to do with the main story. Unbelievably, one of my friends recently cited these interesting interruptions as a reason why they didn’t like Steinbeck. Crazy talk.

Also, Ernest Hemingway has been very enjoyable. I really appreciate what he’s able to convey using such few words. I know I can write that simply, but my icebergs wouldn’t have anything below the surface of the waves.

Reminiscing about the naïve, spam-free days of the web

July 21st, 2009 23:48

Remember a long time ago when the web was free of spam? I’m not talking about email, which has had spam problems for awhile, I’m talking about the web. Nowadays, the web is festering with link-crawling spambots. Anyone with a blog, Twitter account, or heck, even a webpage with a simple submit form with some text fields on it, knows this. There’s not much that can be done about it besides spam-detection heuristic algorithms and CAPTCHAs.

Well, I just recently found some code that I wrote way back in 2002 that displays a blissfully unaware naïvité of what was to come. That code was part of my website Fyre’s Domain, which I have since put an archived copy of online. I had just been learning Perl CGI and I wanted to write a simple guestbook/comments form that readers could use to give me feedback without having to use a mailto: link. This was in the era before blogging software was commonplace — what I was running was a home-brew blog, but before the word “blog” was even invented. I basically copied the format from one of the first chat rooms I ever used, Paddynet, way back in 1995 or so. The “chat room” consisted of an HTML form that would dump unvalidated input (including HTML tags) into a chat buffer displayed on the page that would display the last 30 or so messages.

Paddynet was around long before spambots, but my site was started right when they began appearing in the wild, and the code proceeded to run for another 7 years until I just shut it off.

You can probably guess what happened.

The only reason I even re-discovered this code is because I happened to notice it was getting an unusual number of hits in my web analytics software. And those hits were anything but benign. My poor naïve Perl CGI comments submission form has accumulated 26 MB worth of text over the years, all of it spam. And since I figure it may be interesting to someone to see exactly what seven years of web spam looks like, you can download it for yourself (a text file thankfully compressed down to just 1.8 MB). If anyone finds any interesting trends in popular spam topics over the years in there, do let me know.

So those are the dangers of trusting user input on the web these days. Revel in the blissful simplicity of the following code, which was all it took to implement a comment submission system back in the day. Nowadays you couldn’t get away with anything even close to it. As my data proves, you’ll be eaten alive by spambots.

#!/usr/bin/perl

use CGI qw(:standard);

print header;
print start_html('Leave Comments on Fyre'),
	h1('Leave Comments on Fyre'),
	start_form,
	"<i>Note, all fields are optional, but empty comments will be ignored.</i><br>", 
	"Name: ", textfield(-name=>'name',-default=>''),
	"E-mail: ", textfield(-name=>'e-mail',-default=>''),
	"Your Comments: <br>", textarea(-name=>'Comments',-rows=>10,-columns=>50,-default=>''),
	'<br>',
	submit('Submit'), reset,
	end_form,
	p,
	hr;

if (param() && param('name') ne '' && param('Comments') ne '') {
	$date = `date '+%H:%M %m/%d/%Y'`;

	print '<i>Your comment has been posted.</i><hr><br>';
	@foo = "\n\n" . '<br><b>' . param('name') . '</b> ' . "\n" .
	'<u>' . param('e-mail') . '</u> ' . "\n" . '<i>' . $date . '</i>' . 
	"\n" . '<table><tr><td width = "100%">' . param('Comments') . 
	'</td></tr></table><hr>';
	push @foo, `cat mk.txt`;
	open CFILE, ">mk.txt" or die "Failed to open comments file!";
	print CFILE @foo;
	close CFILE;
}

@foo = `cat mk.txt`; print @foo;

print 'This program is open source, and released under the GPL by Ben McIlwain, 2002.  See 
the source <a href = "mk_script.txt">here</a>.';
print end_html;

Right-wing terrorism

June 10th, 2009 16:26

Today, an anti-semitic terrorist attacked the Holocaust Memorial in Washington D.C. (I’ve been there, and yes, one visit is enough for a lifetime). Last week, an anti-abortion terrorist assassinated a doctor.

Why is the media so afraid to use the word “terrorist” to accurately describe right-wingers engaged in the act of terrorism? Is it that whites can’t be terrorists? Only Arabs?

Until we call it what it actually is, we can’t address it properly.

And since right-wingers were so keen on using water-boarding against terrorists, do you think they’d mind if we tortured these home-grown right-wing terrorists?

My once-tiny GNU/Linux desktop morphs beyond all recognition

April 14th, 2009 19:30

Enermax Chakra
Almost a year ago, I bought a cute little desktop from Dell with the intent of using it as a GNU/Linux desktop alongside my existing Windows desktop. Its name is Vertumnus. But things don’t always turn out as planned. I quickly started using Vertumnus as my exclusive desktop PC, booting the Windows machine only to play games. Eventually I reformatted the Windows computer and the only applications I’ve reinstalled have been games, so it’s pretty much reduced to a gaming appliance at this point, like an XBOX360 but better.

The only problem is that when I originally bought Vertumnus, I didn’t have all of this in mind, and so I bought it rather under spec. I would’ve been better off just buying a better computer from the get-go. As a result, I’ve had to do quite a few upgrades over the past year to get it to meet my needs. From the very beginning I added more RAM and another hard drive. Then it joined a Stand Alone Complex. Then I added another hard drive. From the outside it still looked the same, but a lot of the interior was upgraded. Now even that is no longer true.

Yesterday, I spent two hours (and another $160) redoing the computer even further. The case was too cramped and was preventing further upgrades. So I moved the computer into a new case, the Enermax Chakra. It’s appreciably bigger than the previous Dell case. It’s also a lot more flexible on the inside in terms of which parts will fit into it. Why the Chakra? I only had two criteria, but the Chakra was pretty much the only case that met both of them: 1) It had to have a 250mm fan, but 2) No LEDs. Both criteria come from my computer living in my bedroom: it has to be silent (hence a big, slow-spinning fan) and it has to be dark, so that I can sleep!

Since the case didn’t come with any fans besides the huge 250mm one, I purchased two of the quietest 120mm fans in existence, the Scythe Gentle Typhoon. Again, my criteria were the same: Quiet and no LEDs. The Gentle Typhoons best met those. I also had to get a new power supply, because the 250 Watt one from Dell isn’t able to accommodate the video card I was about to put in. So I went with the Corsair 550W PSU. It was the power supply that best met my criteria: High efficiency (85%!), quiet (a big 120mm fan), and no LEDs. And it’s more than enough to power the video card that I put in, a hand-me-down GeForce 8800 GTS. Yes, that’s right, I finally got tired of the inferior performance of the Intel integrated graphics card. Now I can actually play modern 3D games in GNU/Linux.

And as if all that wasn’t enough, while transitioning all of the parts from one case to another, the CPU fan developed a faulty bearing which makes it obnoxiously loud. So the first thing I hear upon starting up my supposed-to-be-silent computer is a loud whirring fan noise. Rather than giving up my dreams of a silent computer, I ordered a replacement CPU fan/heatsink, the Arctic Cooling Freezer 7 Pro. Why that one? I already have one in my Windows computer and it cools really well. Plus it’s quiet. It hasn’t arrived yet, but it’s going into Vertumnus as soon as it does.

The new GeForce 8800 GTS is so large that it covers up one of the SATA ports on the Dell motherboard (and another one is rendered inaccessible to all but right-angle SATA connectors). Since I have three SATA hard drives and one SATA DVD-R drive, that’s a problem. The DVD drive is currently unplugged, but I’ll swap it out for an IDE DVD-R drive from my Windows desktop soon — thankfully, the video card doesn’t block the IDE port.

Once all of this is done, the only original parts that will remain in Vertumnus from the original purchase will be the Intel Core 2 Duo E7200 processor, 2 1 GB sticks of DDR2 RAM, the motherboard, and one 500 GB hard drive. And that’s after less than one year. Clearly, I tried saving too much money by buying a system far below my ultimate desired specifications, then wasted a bit more than those savings on upgrades. And I can’t even say the upgrades are done. At some point I’m going to need another hard drive, but since I’m all out of SATA ports, I’ll either have to get an add-in card or replace the motherboard. The original RAM that Dell shipped was pretty slow, and can easily (and cheaply) be replaced with something better. And the processor is looking slightly anemic. A nice quad-core processor would be fun to play around with …

Long story short, in another year, it’s quite possible that the only component remaining from my original purchase will be the 500 GB hard drive and a SATA cable or two. I guess I learned my lesson. Don’t try to save too much money on a computer if, at heart, you’re really just a techie who demands performance.

Why I use Identi.ca and you should too

March 22nd, 2009 22:19

Those of you following me on Twitter may have noticed that all of my tweets come from Identica. I started off with Twitter but I quickly switched over to Identica as soon as I learned about it. Identica, if you haven’t heard of it before, uses the same micro-blogging concept as Twitter (and in fact is compatible with it), but has several improvements. I recommend Identica, and if you aren’t using it yet, check out these reasons as to why you should.

There are several practical reasons you should use Identica:

  • All of your data is exportable on Identica, including your entire corpus of tweets. Twitter does not provide this functionality. Should you want to migrate away from Twitter down the road (for any variety of as-of-yet-unforseen reasons), you are unable to do so, but you are able to migrate away from Identica at any point easily. And since Identica uses the Free Software Laconica software, you can even install Laconica on your own web host and import all of your data there, where you can have complete control over it.
  • Identica has a powerful groups feature that allows people to collectively subscribe and see all tweets sent to a group (this is what the exclamation syntax you may have seen in tweets is about). Groups are a powerful way to build communities and have multi-party discussions, but Twitter does not have them.
  • You don’t have to quit Twitter. My Identica account is linked to my Twitter account, so every message that I send to Identica automatically appears on Twitter. Posting to Identica+Twitter takes the same amount of effort as posting to Twitter alone, except it is seen by more people.
  • Identica lets you see things from other people’s perspective. I’ll use me as an example. You can see my entire tweet stream, which includes messages from all users and groups I’m following. This should give you a great idea of the kinds of things I’m interested in. And you can see all of the replies to me, which makes it a lot easier to track and understand conversations. Note that all of this is public information and is accessible on Twitter through trickier ways (in the first case, looking at the list of a person’s followers and combining all their tweets in chronological order; in the second case, by searching for “@username” on the search subdomain), so you aren’t giving up any of your privacy. Identica simply makes these features a lot easier to use.
  • Some people you may end up finding and wanting to talk with don’t use Twitter at all; they’re only on Identica. Get on Identica and link it to Twitter and you can talk to everyone on both services. Just use Twitter, however, and you’re left out in the cold with regards to anyone who only uses Identica.

And there is one important ethical reason you should use Identica:

  • Identica is Free (as in freedom, not merely cost). Because it follows the Free software ethos, it respects your rights and maximizes your freedom to control your data as you see fit, including the ability to move all of your data elsewhere if necessary. Twitter does not respect these freedoms.

Australia blocks my page from their Internet

March 18th, 2009 23:15

A couple years ago, when I was more active on Wikipedia than I am now, I was trying to prove a point by compiling a list of all of the risque images on Wikipedia (link obviously NSFW). I don’t quite remember what that point is anymore, but the list remains. It has even survived a deletion attempt or two. I stopped maintaining it a long time ago, but for whatever reason, others picked it up and continued adding more pictures in my stead. I haven’t thought of it in awhile.

So imagine my surprise when I learn that that silly page has made Australia’s secret national Internet censorship blacklist. I don’t understand the justification here — all of these images are hosted on Wikimedia servers, after all — but I have to laugh when I imagine some Australian apparatchik opening a report on this page, viewing it, making the determination that it’s not safe for Australian eyes, and adding it to the list without further thought, mate.

Australians, please take back control of your country.

A Python script to auto-follow all Twitter followers

March 10th, 2009 19:30

In my recent fiddling around with Twitter I came across the Twitter API, which is surprisingly feature-complete. Since programming is one of my hobbies (as well as my occupation), I inevitably started fooling around with it and have already come up with something useful. I’m posting it here, so if you need to do the same thing that I am, you won’t have to reinvent the wheel.

One common thing that people do on Twitter is they follow everyone that follows them. This is good for social networking (or just bald self-promotion), as inbound links to your Twitter page show in the followers list of everyone that you’re following. You’d think Twitter itself would have a way to do this, but alas, it does not. So what I wanted to do is use a program to automatically follow everyone following me instead of having to manually follow each person.

Other sites that interface with Twitter will do it for you (such as TweetLater), but I’m not interested in signing up for another service, and I’m especially not interested in giving out my Twitter login credentials to anyone else. So I needed software that ran locally. A Google search turned up an auto-follow script written in Perl, but the download link requires registration with yet another site. I didn’t want to do that so I decided to program it for myself, which ended up being surprisingly simple.

My Auto-Follow script is written in Python. I decided to use Python because of the excellent Python Twitter library. It provides an all-Python interface to the Twitter API. You’ll need to download and install Python-Twitter (and its dependency, python-simplejson, if you don’t have it already; sudo apt-get install python-simplejson does the trick on Ubuntu GNU/Linux). Just follow the instructions on the Python-Twitter page; it’s really simple.

Now, create a new Python script named auto_follow.py and copy the following code into it:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#(c) 2009 Ben McIlwain, released under the terms of the GNU GPL v3.
import twitter
from sets import Set

username = 'your_username'
password = 'your_password'
api = twitter.Api(username=username, password=password)

following = api.GetFriends()
friendNames = Set()
for friend in following:
    friendNames.add(friend.screen_name)

followers = api.GetFollowers()
for follower in followers:
    if (not follower.screen_name in friendNames):
        api.CreateFriendship(follower.screen_name)

Yes, it really is that simple. I’d comment it, but what’s the point? I can summarize its operation in one sentence: It gets all of your friends and all of your followers, and then finds every follower that isn’t a friend and makes them a friend. Just make sure to edit the script to give it your actual username and password so that it can sign in.

Run the script and you will now be following all of your followers. Pretty simple, right? But you probably don’t want to have to keep running this program manually. Also, I’ve heard rumors that the Twitter API limits you to following 70 users per hour (as an anti-spam measure, I’m guessing), so if you have more than 70 followers you’re not following, you won’t be able to do it all at once. Luckily, there’s a solution for both problems: add the script as an hourly cronjob. This will keep who you follow synced with your followers over time, and if you have a large deficit in who you follow at the start (lucky bastard), it’ll slowly chip away at it each hour until they do get in sync. In Ubuntu GNU/Linux, adding the following line to a text file in /etc/cron.d/ (as root) should do it:

0 * * * * username python /path/to/auto_follow.py >/dev/null 2>&1

This will run the auto_follow script at the top of each hour. You’ll need to set the username to the user account you want the job to run under — your own user account is fine — and set the path to wherever you saved the auto_follow script. Depending on your GNU/Linux distribution and which cron scheduler you have installed, you may not need the username field, and this line might go in a different file (such as /etc/crontab). Refer to your distro’s documentation for more information.

So that’s it. That’s all it takes to automatically auto-follow everyone who’s following you — a dozen or so lines of Python, one crontab entry, and one excellent library and API. Enjoy.