The impending crush of data overload

Earlier I wrote about a research project using Mars Global Surveyor (MGS) images that I would be working on. Well, that project has started, and I’ve immediately come to a startling revelation. We’re suffering from a massive, massive case of data overload. Basically, my instructions for the research project are “pick any two images of Mars, tally up craters of different sizes, and plot the numbers against known isochrons to determine surface age.” Note the part where it says “pick any two images of Mars” — that’s because no one has really yet had the time to perform a detailed surface age analysis on Mars, except for a few notable sites, like rover landing sites, Olympus Mons, etc.

So I have my choice of hundreds of thousands of high resolution surface photographs taken by Mars Global Surveyor. There’s another guy working on the same research project (we’ll be cross-checking each other’s data), so in this semester, four more photographs of Mars will be analyzed. And there aren’t too many people at other universities working on this stuff, either. So, at this rate, the backlog of MGS images will take hundreds of years to process for surface ages at the current rate. That’s a data overload. We spent hundreds of millions of dollars to send MGS, and we did get a lot of data on it, but at least when it comes to analyzing surface ages from the high-resolution narrow angle camera, the sheer amount of data vastly overshadows the available amount of human work.

This isn’t just a problem with Mars photographs, of course. It’s a problem with everything. Current estimates are that the sum total amount of data produced by humanity increases by 66% per year. That’s about on pace with Moore’s Law. By comparison, the rate of increase in manufacturing of the fastest-increasing industrial goods, such as paper and steel, is only 7%. This is a phenomenally vast amount of data, and I would venture a guess that the vast majority of it is simply going unanalyzed, which is a terrible shame.

The one thing that would most help to clear this huge data backlog the creation of better automated tools for analyzing data. For instance, the Mars craters need to be counted by hand because nobody has yet programmed an image recognition tool that is accurate enough to be used with craters. Counting craters isn’t as easy as it sounds; many of them are heavily eroded, and good judgment needs to be used to help separate real craters from features that are merely circular (which there are a surprising number of). The thing is, this task doesn’t even require strong AI (which is decades off); merely a good amount of effort put into an algorithm. But so few people are working on this that nobody has yet made good tools.

I would really love to modify an existing image recognition algorithm for crater-counting and use it on images of Mars. Rather than manually counting just two photos, I could blow through the ”entire” decade-long backlog of MGS images. That would be incredibly awesome. But I just have one semester on this research project, and it’s not necessarily realistic to think that I could do this. If someone else could do it, though, that would be really grand, and would earn some accolades.

One Response to “The impending crush of data overload”

  1. Keywords Digital Camera Says:

    Best Quality Digital Camera:…

    Digital Cameras technology has come a long way. Casio, Canon, Samsung, Panasonic, Olympus and Sony make the top leading cameras….