Hadoop, big data, and the elephant in the room

Hadoop | Tech and Tools   |   
Published May 12, 2014   |   
arvindl

In the 1800s, John Godfrey Saxe wrote a poem about six blind men and an elephant based on an old Indian story. In an effort to discover what the elephant is, each man touches a different part of the creature and subsequently draws his own unique — and incorrect — conclusion about what the beast is. Saxe charitably observes that “each was partly in the right, and all were in the wrong.” Fast forward to today, and the elephant may as well have been called Hadoop.

Once again, with Hadoop, we have people trying to describe a puzzling animal. The opinions are as varied — and sometimes as incorrect — as they were in the poem.

Hadoop has been variously described as the ideal way to do transaction processing, the ideal way to do search, and the ideal way to do analysis, all of which are quite different use cases. If that were not unlikely enough, it is also claimed to be the best way to analyze structured data, semi-structured data and unstructured data. In fact, we are lead to believe that it is everything to everyone. How is this possible?

Hadoop is a primitive, undifferentiated technology that can be molded in various ways. In the evolutionary tree, it’s far closer to low-level programming languages like C and Java than it is to function-specific programs like database management systems and even higher-level user applications like spreadsheets.

Read More