Defining Big Data

Resources   |   
Published May 12, 2014   |   
Edd Dumbill

As the field of big data grows, increasing numbers are introduced to its concepts, and I often hear the basic question of “is my data big enough to be big data?” Seven terabytes? Seventy terabytes? Seven hundred?

It’s too late now to change the name, of course, but the “big” part of “big data” is troublesome. It’s a poor signpost to what’s important about big data, and carries borderline puerile overtones of boastfulness.

The mainstream media has adopted a definition of big data that’s broadly synonymous with “analytics”, albeit mixed in now and then with a smattering of privacy-invading personal data collection. For me, that’s often a good enough definition, as I’m interested in people understanding that there’s power and potential in data.

As one of the people responsible for early definitions, I wrote in January 2012 that big data is “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures”. This has proved an accurate definition over the years, and has been adopted as the basis of the definition you will find in Wikipedia’s big data page.

Read More