Archive for the 'Microsoft' Category

Using LINQ in C# to easily get a list of most often used words

Monday, May 20th, 2013

A pretty common programming interview question is to parse an input sentence and return the list of unique words used in that sentence. A further elaboration on that problem, the one that this post will be addressing, is to additionally calculate the number of occurrences of each word, and then return the top K words for some input value K. I’ll be demonstrating a simple solution to that problem in C#, both because I’ve been using it a lot recently and also because the choice of C# gives us access to LINQ, which is a powerful C# language feature that allows queries on collections using a SQL-like syntax. The top K problem is incredibly easy to solve in SQL, and boils down to SELECT TOP @K ..... FROM Words ORDER BY Word.Name DESC. The C# solution is similarly easy.

First, I’ll clarify the assumptions that I’m using (and that would be wise for an interviewee to address if none of these are made explicit):

  • I’m writing my solution to be case-insensitive. “case”, “Case”, and “CASE” will all thus count as the same word.
  • I’m not dealing with ties. If you ask for the top 3 words by occurrence and there are 5 words that are all used the most in equal numbers, then you’re still only going to get three words of those five, selected in no particular manner.
  • I’m going to use the C#’s language spec definition of a non-word character in regular expressions to separate the input sentence into words. The naive solution would be to split the string on only spaces, but then you’re not handling punctuation correctly.

And for the solution:

Dictionary<string, int> GetTopKWords(string input, int k)
{
	string[] words = Regex.Split(input, @"\W");
	var occurrences = new Dictionary<string, int>();
	
	foreach (var word in words)
	{
		string lowerWord = word.ToLowerInvariant();
		if (!occurrences.ContainsKey(lowerWord))
			occurrences.Add(lowerWord, 1);
		else
			occurrences[lowerWord]++;
	}
	return (from wp in occurrences.OrderByDescending(kvp => kvp.Value) select wp).Take(k).ToDictionary (kw => kw.Key, kw => kw.Value);
}

The vast majority of this code is responsible simply for finding the list of unique words the number of occurrences of each one. Once that is known, finding the top K words is a single line of code thanks to the power of LINQ. All we’re doing is ordering the words by frequency in descending order and taking the top K words. Without LINQ, there would be significantly more book-keeping code required to do this, which would make a good exercise for the reader (e.g. solve this problem in Java). The first roadblock you’ll probably run into is that you can’t simply flip the keys and values of the array, because the frequency counts aren’t unique. The best I’ve come up with is to construct a list of ordered tuples out of the dictionary of words, order it on the occurrence count part of each tuple, and then extract the first K elements from the resulting ordered list and return it. Let me know in the comments if you have a better solution.

Oh, and here’s an example input/output for the program, handling the display of output using LINQPad:

var input = "the quick brown fox is brown and jumps over the brown log over the long fire and quickly jumps to a brown fire fox";
GetTopKWords(input, 10);

outputs:

Top K Words sample run

How to fix images not displaying in Microsoft Word 2007

Thursday, June 26th, 2008

Recently I’ve been hit by a bug (or what I thought was a bug) in Microsoft Word 2007: images embedded in the document did not display in any mode other than “Full Screen Reading”. And since the editing ribbons are not available in that mode, it’s hard to get work done. This all started when Word crashed on me one time; ever since then, images simply haven’t been displaying correctly. I get a border where the image should be and white space inside. But when I send the file to other people and they open it, they can view the images just fine. I can even add images to documents; I just can’t see them.

So I performed a Google search on this issue, but the only relevant “solution” was behind a paywall over at ExpertSexchange. After a few minutes of trying to figure it out on my own, I stumbled upon the solution, and to save everyone from the hell that is ExpertSexchange, here it is:

Click the Office Button (it’s in the upper left corner of Word), select “Word Options”, select “Advanced” in the left pane, scroll down to the “Show document content” subsection, and uncheck the “Show picture placeholders” option. Yes, it’s that simple. Somehow, when Word crashes, this option can get turned on all by itself. It’s really annoying because there’s no clue that Word is intentionally hiding images from you; it just feels like a bug. And the reason for this insane option?

Word 2007 Options dialog

That’s right, it’s for performance. And it improves performance only at the expense of severely crippling usability. You’d think this option should never be able to get turned on accidentally, yet there it is. At least you know the solution now.

Bringing a Windows mindset to a GNU/Linux world

Thursday, June 12th, 2008

I just ran across a level of stupid so off the charts I had to immediately comment on it here lest my inaction unwittingly foster an environment tolerant of such stupidity. Allow me to quote from a post on Linuxforums:

When I say cd’d I mean I used the command cd, to change directory.
So for example say I downloaded and extracted the drivers to the desktop I would open a Konsole window and type:
sudo cd /home/sebmaster/desktop/[folder extracted to]/
(You probably dont need sudo but I have got into the habit of adding it before pretty much everything)

Those of you are familiar with GNU/Linux should see this heaping mound of stupidity for what it is immediately, and will likely find the following explanation superfluous. For the rest of you, here’s a detailed explanation.

There are two distinct nexuses (nexi?) of stupidity inherent in this quote. The first is the notion that sudo, a wrapper program that executes the program passed to it as an argument with root (adminstrator) privileges, will do anything with the change directory command. It won’t. Cd is a shell command; it is not a program. Sudo can’t even find it. The exact error message I get is “sudo: cd: command not found”. And even if cd was a program, using it in this way wouldn’t do anything, since the new working directory would be lost when the sudo subshell terminated. And even if that did work, it still wouldn’t be useful, because there’s no point in setting your working directory to a directory you don’t have access to anyway. You’re still going to need to use sudo with every subsequent command just to get access to those files, so the sudo cp is superfluous; just skip the cd altogether and use a qualified path to the files.

But that’s not even touching on the second (and greater) nexus of stupidity, which is the very-Windows-like mindset that everything should be run as administrator. Saying “You probably dont need sudo but I have got into the habit of adding it before pretty much everything” is like saying “You probably don’t need a live hand grenade but I have got into the habit of carrying one around with me everywhere I go.” Like a live hand grenade, sudo is potentially very dangerous, as the root account has total access to the system (so simple mistakes or security compromises become far worse than they would with mere user account permissions). The mantra to live by is: Never run anything as root unless it is absolutely necessary. As soon as I read that this faithful deliverer-of-the-stupid executes pretty much everything as root out of force of habit, I stood up from my computer, placed my hand over my face, and let out a very long, exasperated sigh. Why doesn’t he just su at the beginning of every terminal session and get it over with?

Oh wait, I probably shouldn’t have said that. He’s probably going to read that last bit, miss all the rest of the content in this post, and think that’s a good idea. “Hey, now I don’t even have to type sudo anymore, because everything I do is always as root!” Yes, even changing directories.

DRM: how things you’ve bought aren’t actually yours

Friday, May 30th, 2008

We free software folk have been trying to warn people about the dangers of Digital Restrictions Management for a while, we really have. Yet you just aren’t listening to us! Well, here are two recent all-too-obvious-in-hindsight DRM travesties by Microsoft that might have you reconsidering. If Microsoft can’t even be trusted to do DRM correctly, then who can?

First, Microsoft decided to close down their MSN Music service, presumably because it was unprofitable. Unfortunately for any customer who ever bought anything from the store, they won’t be able to play their purchased music files on any additional devices come June because Microsoft is shutting down the servers. Each audio file is actually a file encrypted with DRM, and once the servers go away, so too go any of the means of being able to decrypt the files. Ain’t it great that “pirates” will be able to play their downloaded mp3s indefinitely, but people who legitimately purchased the music will be stuck with worthless files and no refund? But that’s what you get when you willingly buy something infected with DRM.

Microsoft also uses Digital Restrictions Management on all of its Downloadable Content for the XBOX 360. All downloaded files are linked both to the user account and to the hardware. Want to change accounts? You can’t take your downloads with you. Buying another XBOX 360? Can’t take ‘em with you. Buying another XBOX 360 because your old one broke? You’re still screwed! That’s right, this poor sap’s XBOX 360 broke, taking all of the downloaded content that he bought along with it, and Microsoft’s only response was “buy all your content a second time.” It makes you wonder why they even use the word “buy”, because when you actually buy something it implies that you actually own it. If this is really the future of gaming consoles, we gamers are in big trouble. Microsoft is trying to supplant a decent product (games on DVD that can be played in any console) with an inferior one, simply because they can make a lot more money with it, what with the duplicate downloads, lower distribution costs, no need to print manuals, etc.

And why shouldn’t they? By buying all of this content that’s infected with DRM, we customers are bringing it all down upon ourselves. Unfortunately, many people will only realize too late how evil DRM is — after they’ve spent thousands of dollars on music only to have the authorization servers shut down, or after they’ve spent hundreds of dollars on downloadable content only to have their XBOX 360 crap out on them. And Microsoft doesn’t care about fixing any of this. They already have your money, and they’re big enough they can just tell you to go screw yourself. Actually, I wish they were that kind, because tauntingly suggesting you pay again for everything you’ve already purchased once is worse.

So join with me and refuse to buy anything that’s infected with DRM. Support the EFF’s anti-DRM campaign. Support the Defective by Design campaign. Spread the word. Don’t be the poor sod who abruptly finds himself “owning” hundreds of dollars of worthless DRM-infected files that cannot ever be used again.

The failings of development in Windows

Tuesday, May 27th, 2008

Drinian (regular commenter here) pointed me to a great series of articles on the failings of Microsoft in recent years. Particularly, the Windows APIs are inconsistent and not pleasurable to use from a development perspective, and with Windows Vista and its flagship applications, Microsoft has released a wildly inconsistent smattering of user interfaces. I’m not going to try to sum up the articles in any further detail; they’re so full of content that you really have to read them for yourself:

  1. From Win32 to Cocoa: a Windows user’s conversion to Mac OS X – Part I
  2. From Win32 to Cocoa: a Windows user’s conversion to Mac OS X – Part II
  3. From Win32 to Cocoa: a Windows user’s conversion to Mac OS X – Part III (Updated 2008-06-01)

And yes, I know there’s a lot of “Mac OS X” in the title there, but the majority of the content really is about Microsoft and Windows. The third part in the series isn’t out yet, but when it is, I’ll try to update this blog post with a link to it.

For the record, I personally agree with pretty much everything Peter Bright says about Windows development. I did a good bit of it at my previous job and it was ugly. .NET hasn’t made significant improvements in this regard because it makes way too many concessions to long-deprecated functionality. And the wide variety of official Microsoft user interfaces in Windows Vista is incredibly off-putting. Why does every large Microsoft application function completely differently?! If I’m a third party developer writing my own application, what do I try to make it look like? The answer isn’t Microsoft Office 2007 (even though it’s my favorite new interface of the lot), because the ribbon menu implementation is specific to the Office codebase and doesn’t even have a public API! Brilliant!

My toolkit of essential Windows programs

Monday, October 22nd, 2007

I’ve been around the block a couple of times when it comes to using Windows as a desktop environment (unfortunately). The least I can do is help ease others’ agony by sharing the toolkit of extremely useful Windows programs that I’ve accumulated over the years. Many of these programs you’ve likely already heard of (such as Firefox). Others you will never have heard of, but you’ll wish you’d found out about them years ago. Note, the programs are presented in no particular order.

SpaceMonger
SpaceMonger
SpaceMonger is an incredibly useful program that graphically depicts exactly how all of the space on your hard drive is being used. It scans your entire hard drive then displays its contents in blocks, with the area of each block directly proportional to the size of the file/folder. This is very helpful when you’re out of space and trying to come up with something to delete to free up space. I’ve run across multiple DVD images I’d long forgotten about and no longer needed, providing a savings of 4.5 GB each. SpaceMonger serves a dual purpose: finding lost files (the most interesting ones are always large, right?) and reclaiming drive space. What’s not to love?

Mozilla Firefox and Thunderbird
Who hasn’t heard of Mozilla Firefox? It’s quite simply the best browser out there. It far eclipses Internet Explorer, and its selection of extensions can’t be beat. Slightly less well known is Mozilla Thunderbird, a mail program. Yeah, I know most people check their mail using a website these days, but I still don’t think they have anything on a real desktop client. I have Thunderbird configured to download mail directly from multiple email accounts. Firefox and Thunderbird are both Free Software (meaning free as in freedom, not free as in price) and available for a huge assortment of operating systems, including GNU/Linux.

Sure Delete
Sure Delete: For when it absolutely, positively has to be deleted. Sure Delete can run in two modes, either targeting specific files and folders for sure deletion or truly cleaning out all of the free space on your hard drive (remember, when you delete something, its data isn’t actually overwritten; the space is just marked as free). No matter which mode you use, it overwrites the targeted data on your hard drive many times, making it totally unrecoverable even with advanced forensic techniques. Sure Delete is great for paranoid types. Some people may claim, “If I don’t do anything wrong, what do I have to hide?” Don’t get caught uttering such utterly naive last words. Protect yourself. If you need to clean an entire drive, like if you’re giving away your computer, step up to Darik’s Boot and Nuke to totally protect your privacy. But if you just need to delete a few files and otherwise keep your operating system intact, Sure Delete is the way to go.

Read the rest of this entry »

Who uses Window Vista?

Wednesday, August 15th, 2007

Windows Vista has been out for a little while now and I still haven’t even seen it in use. This far out from the Windows XP launch I had at least marveled at several people using it, although I hadn’t switched to it myself just yet. But I haven’t even had the opportunity to see Vista in action because nobody I know uses it or is even interested in using it (me included). Its many disadvantages include: the high cost of upgrading, incompatibility with currently working programs on Windows XP, and of course, draconian DRM. The last part especially irks me because Vista has lots of “features” that are blatantly anti-user, such as how it degrades high quality video to high quality outputs ostensibly as an anti-piracy measure. Realistically, it means don’t bother trying to use Vista to output video to an HDTV, or using it to watch HD-DVDs or Blu-ray discs. Ridiculous.

My work has absolutely no plans to upgrade to Windows Vista. Even the new laptops we new employees got came with Windows XP rather than Windows Vista. Everything already works, and the IT guys simply don’t see the need to spend the money for the sole privilege of potentially having all of our mission-critical applications fail. When Vista first came out manufacturers such as Dell were only offering new computers that came with Vista. After the inevitable backlash, they started offering XP again, and will probably do so for a nice long while.

However, the upgrade to Vista is eventually going to become inevitable as long as one wants to keep using a Windows environment. Using XP over Vista will be no more practical in a few years than trying to use 98 over XP is today. That’s why I’m simply going to stop using Windows at that point. I’ve had seven years experience with using GNU/Linux now, mostly on servers, but also on desktops. My current desktop dual-boots into Fedora Core 6, for instance. But what keeps bringing me back to Windows is the computer games. Well, not for much longer. I already play far fewer computer games than I used to. I’m simply finding other things to do with my time, like editing Wikipedia, writing, reading, and updating this blog. In a few years’ time when continuing to run Windows XP becomes untenable, I’m going to switch over to GNU/Linux once and for all and never look back.

Update 2007-08-18: Looks like the editor in chief of PC Magazine, who is just stepping down, isn’t thrilled with Vista either. He has a litany of complaints about flaws in the operating system that still aren’t fixed after nine months. I’m so glad I didn’t upgrade.

Kudos to Microsoft on Office 2007

Thursday, March 1st, 2007

I never thought I’d say this, but … kudos to Microsoft. Office 2007 is an excellent product. I was trying to use OpenOffice to put a thesis paper together and the images just weren’t working, so I switched over to Word 2007. The difference was truly night and day. You can just tell a lot of work was put into this product. It’s so … efficient. The new menu bars are a huge improvement. I can see why it would be worth several hundred dollars per license to a corporation (though I couldn’t personally justify the expense).

The one thing I don’t like about Office 2007 is that it crashed on me twice in under an hour of work. Twice! OpenOffice has never crashed on me. What’s with the instability? At least Microsoft seems to have succumbed to the philosophy of “If we can’t fix crashes, we can at least make them painless.” Crashes in Word 2007 are painless. It automatically realizes that it’s crashed, shuts itself down, restarts itself, and recovers the document you were working on. So you don’t actually lose any work, it’s just really annoying.

C’mon Microsoft, fix the crashing issues and you’ll have a near-perfect product.

Why I’m not excited about Microsoft Windows Vista

Friday, February 23rd, 2007

Microsoft Windows Vista has been out long enough for all of us to get some perspective on it. The over-optimistic sales forecasts are in the past and it’s settling in for the long haul. Make no mistake, in the long run, you don’t have any choice about eventually using Vista, just as, say, using Windows 98 wasn’t really a viable choice a year ago versus using Windows XP. All new computers are going to be coming with Vista (unless you get a Mac or choose Linux), so you’ll end up using it eventually. But you should hold off from upgrading until absolutely necessary. Vista has a lot of downsides.

For one, I previously commented on how Vista has draconian Digital Rights Management in an age when most companies are moving away from DRM. But Vista is also rather expensive, especially if you want all of the cool stuff that really makes Vista worthwhile. That article lists lots of other problems with Vista, and recommends against upgrading.

Microsoft also oversold Vista’s security. The Register has an article detailing Vista’s new security features and identifying possible future flaws. Basically, Vista still doesn’t do as good of a job of compartmentalizing system stuff from user stuff as, say, ten-year-old Unix. So we’re inevitably going to continue to see Windows security flaws far into the future. Sigh. It could have been much better.

Vista to suck it long and hard

Wednesday, January 24th, 2007

This white paper is an extremely interesting read about the future of computing as envisioned by Microsoft, particularly in relation to what they’re doing with their new Vista operating system. The paper itself is rather long (and a highly recommended read), so here’s the very short summary: Vista is going to suck it longer and harder than Aholah and Aholibah combined. That’s some Biblical suckage right there.

As for the slightly longer summary, Vista is going to have all sorts of Draconian “premium content protection” restrictions that make computers mostly unusable for the average joe. For instance, any output that isn’t crippled with DRM (which pretty much includes everything on the market today) has to be either disabled or degraded in the presence of “premium content”. So, do you have a nice LCD monitor? Prepare for it to look like a 14-year-old CRT if any premium content nears your computer. Have a nice sound system? Well, ready to hear it sound like tin cans on wires? That is, anyway, if it’s not merely disabled outright.

The paper also goes over how Microsoft’s new “features” are going to cripple the rest of the industry by requiring closed standards and such for anything to be compatible with premium content on Vista. This is a sneaky way to try to kill open source: make it so that it can’t even run on any of the new hardware, wait a few years, and bam, it’s dead. Frankly, you really have to go read the entirety of this paper to see how Microsoft is prepared to ream us all a new one. I predict massive backlash against Microsoft and either a loosening of these restrictions with patches or a majority of users simply deciding not to take the plunge. It will be a bad day, however, when new computer manufacturers exclusively sell their PCs with Vista.

This paper also points out how none of the current video cards on the market will even be able to play HD content in Vista because they all lack the “necessary” content protections. Microsoft is so focused on crippling the functionality of each and every computer, while, of course, they’re utterly unable to do anything about the real hackers out there who figure out how to decrypt this content (like muslix64, who hacked both HD-DVD and Bluray). So the normal users are going to be utterly screwed over and the hackers (who I will proudly consider myself a member of) won’t face any of the problems. Does Microsoft really expect users to put up with this?