Big data analytics – Not just a matter of scale

Published February 4, 2015 |

Follow effective action with quiet reflection. From the quiet reflection will come even more effective action. Peter Drucker
It is only when you reflect on your experiences do they translate into learning. And so, having been in the technology space for close to three decades, I’ve had the opportunity to get a ringside view of most tech trends. But perhaps the one that has engaged me the most has been the growth of ‘big data’ and the associated trends and technologies. So much so, that I even jumped into the fray about two years ago, when I wanted to move from mere trend-watching to actually shaping the way we could use big-data technologies.
Big data is often used as the holding term for many solutions and technologies. It is also perhaps one of the few tech trend topics, which are debated in the context of both B2B and B2C domains. While big data is associated with unstructured data of the social web, it is also used to represent the underlying technology, to solve the massive scale of the Internet of things.

The five trends

The way I see it, in the current digital world, there are five trends that changed the scale dimension in data sets and created the domain of big data analytics, as we know it today.
First, data sets have gone beyond traditional enterprise data formats and include areas of social data and machine data.
Second, the rise of social curation of decisions, in both enterprise and consumer context resulted in creation of new methods in analytics.
Third, the rise of a consumption-driven society with an opinion driven commerce, has pushed analytics to a near real time and operational side of decision making, as opposed to a historical analysis of data.
Fourth, fall in the prices of electronics, computing, storage and networking resulted in an increase of automation across various industry sectors. This changed the scale of consumption.
Fifth, the ease of access, usage and scaling for computational and mathematical models with open source tools, pushed the leverage of these tools beyond expert community. It led to an increased adoption by the general IT systems community.

The five twists

These five trends on the scale side of big data have created five twists as well – for developers and users to grapple with.
Twist 1: It is the data that drives the model, this has created a demand for methods of unsupervised learning.
Twist 2: The focus has shifted to meaning and interpretation from just analysing data streams for pure information.
Twist 3: The visualization of complex and inter-twined data sets became a critical need, and became the primary means of messaging.
Twist 4: The need for cross-discipline skills increased, as the methods covered diverse areas across statistics, graph theory, large-scale computing, information modelling and user behaviour and experience.
Twist 5: The centricity of user experience pushed the analytical models to take into account the elements of irrationality. This resulted in the engagement with approximate or imperfect models of computation.

What sets big data analytics apart?

It is important to note some key differences in big data analytics in terms of its conceptual model.

Functions move where data resides in big data domain. This is just the opposite of what traditional analytics does where the data moves to functions for processing. Open source movement is driving the big data systems, to be used as a landfill of the multiple data streams before any processing takes over.

It is BASE (Basic availability, soft state and eventual consistency) over ACID (Atomicity, Consistency, Isolation and Durability) in big data processing. It is also important to have polyglot persistence here.

The processing of social data is moving to small volume but at a faster rate model due to the rise of twitter, WhatsApp, and other messenger applications. It is expected that the content includes more voice, photo and video in these streams; hence classical text mining needs to take into account a mixed media analytics requirement.

The rise of choice engines

It is in the above context that the recent advances in recommender systems need to be viewed. The use of graph based systems for solving social and information affinity problems is a case in point. The emergence of behavioural systems theory to explain online social behaviour adds a new dimension to the rise of choice engines. The analysis of choice engines in terms of cross category affinity, called taste and context in a specific domain, is the fine blend of behavioural science theory with computational models. The theory behind choice engines is after all a matter of scale, as well as taste!
But then choice, tastes and graphs deserve more than just a passing mention. In my next post, I will attempt to outline these concepts and some of the associated challenges, for big data companies like us who are building choice engines.