Is Hadoop a New Storage Paradigm?

Published September 13, 2013 |

John Webster

According to historians of economic theory, there are two ways to define creative destruction. One is negative—that capitalist systems unleash forces that lead to their own demise. The other is positive—that those forces result in better products and services that replace the ones that were destroyed. It’s the second one we’ve latched on to. Creative destruction is a good thing. For the computing industry, I can think of no better example of creative destruction than the open source movement.

I believe that the storage industry is ripe for at least one creatively destructive event. Not surprisingly, it comes from the open source community. I’m talking about the emergence and current proliferation of Apache Hadoop. Modern storage systems, despite their many modern advancements—including the adoption of solid state—are still tethered to the past. They’re proprietary and expensive—particularly at the Petabyte and even Exabyte scale requirements we can already see on the horizon. ( EMC EMC +0% recently surmised they would soon ship an Exabyte of storage to a single customer, albeit in the form of a train-load of boxes.)

While Hadoop is commonly seen as a big data analytics system, I believe it can also be seen as a storage platform. Here’s why:

Vendors now commonly speak of Hadoop for the enterprise as the “big data lake.” This is particularly true of vendors with a traditional database and/or data warehouse ax to grind. It’s a common repository for all enterprise data—structured and unstructured. These same vendors are more than happy to show prospective users how Hadoop can pre-stage data before its fed into an existing data warehouse as well as be used as an active data archive post the data warehousing process. In the big data lake scenario, Hadoop is very scalable and intelligent storage for the Big Data version of the data