The three most common ways data junkies are using Hadoop

Published December 16, 2013 |

Shaun Connolly

Analytic applications come in all shapes and sizes–and most importantly, are oriented around addressing a particular vertical need. At first glance, they can seem to have little relation to each other across industries and verticals. But in reality, when observed at the infrastructure level, some very clear patterns emerge: they can fit into one of the following three patterns.

Pattern 1: Data refinery

The “Data Refinery” pattern of Hadoop usage is about enabling organizations to incorporate these new data sources into their commonly used BI or analytic applications. For example, I might have an application that provides me a view of my customer based on all the data about them in my ERP and CRM systems, but how can I incorporate data from their web sessions on my website to see what they are interested in? The “Data Refinery” usage pattern is what customers typically look to.

The key concept here is that Hadoop is being used to distill large quantities of data into something more manageable. And then that resulting data is loaded into the existing data systems to be accessed by traditional tools–but with a much richer data set. In some respects, this is the simplest of all the use cases in that it provides a clear path to value for Hadoop with really very little disruption to the traditional approach. No matter the vertical, the refinery concept applies. In financial services, we see organizations refine trade data to better understand markets or to analyze and value complex portfolios. Energy companies use big data to analyze consumption over geography to better predict production levels.