The NY Times has a nice piece by Ashlee Vance in the Business Computing section highlighting a software and a company–Hadoop and Cloudera, respectively–that are in the business of, what I am calling, “infra-search.” (Truth to be told, it is quite simply “data analysis” but that is too broad and also not specifically geared towards what I feel is the “monoculture of Search” of the Web today. What is meant by infra? Here’s the dictionary.com definition:
By 2003, Google found it increasingly difficult to ingest and index the entire Internet on a regular basis. Adding to these woes, Google lacked a relatively easy to use means of analyzing its vast stores of information to figure out the quality of search results and how people behaved across its numerous online services.
So a couple of cats at Google came up with something called MapReduce to help them manage the data. So why is this important?
MapReduce represented a couple of breakthroughs. The technology has allowed Google’s search software to run faster on cheaper, less-reliable computers, which means lower capital costs. In addition, it makes manipulating the data Google collects so much easier that more engineers can hunt for secrets about how people use the company’s technology instead of worrying about keeping computers up and running.
….
The MapReduce technology helps do grunt work, too. For example, it grabs huge quantities of images — like satellite photos — from many sources and assembles that information into one picture. The result is improved versions of products like Google Maps and Google Earth.
MapReduce allowed Google to focus on search, which is the bread and butter of Google, although it makes money really by selling advertising (how traditional, right?).
So how do we get from MapReduce to Hadoop and Cloudera?
Well this MapReduce thing has been the kind of trade-secret that everyone–business folks and Internet studies people alike–have been wondering about. How does it work? And more significantly, how does it work so well? Well Google has published some papers on MapReduce, which was enough for others to make their own version, including Hadoop, which Yahoo is behind.
Now a lot of folks are jumping on the bandwagon because of how well Hadoop analyzes data. Facebook uses Hadoop to manage all the photos, more specifically photo-tagging, as it allows for Facebook to determine how “close” the connection is between two people. Even the notoriously anti-open source (Hadoop is open-source like any reasonable software should be) Microsoft has been down with it too.
Now what does this all mean for regular people that just use the Web and aren’t fascinated with with the intricacies of search?
“What if Google decided to sell the ability to do amazing things with data instead of selling advertising?” Mr. Hammerbacher [co-founder of Cloudera]asked.
That proposition is pretty significant, no?


0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.