In my last post, I talked about the paradox of big data, or how data can be misery… In today’s post, I will talk about how that paradox can be resolved… and how data can set you free.
Consider these statements by some of the leading writers, academics and researchers into the way we function.
Sheena Iyengar, the Columbia professor and author of the brilliant book ‘The Art of Choosing” which was on the shortlist of the FT Business Book of the year in 2011, writes “When people are given a moderate number of options (4 to 6) rather than a large number (20 to 30), they are more likely to make a choice, are more confident in their decisions, and are happier with what they choose.” She explains this brilliantly in her Ted Talk on the same topic.
Chris Anderson, founder of Ted, wrote this in his 2008 article in Wired “The End of Theory: The Data Deluge makes the scientific method obsolete” :”…. Correlation is enough. We can stop looking for models. We can analyze the data without hypotheses about what it might show….The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all….”
In these two ideas, lies the answer to the paradox of data. When we stop trying to understand why something happened, when we try to let statistical algorithms find patterns we cannot detect, and when we use these to reduce the number of choices we need to make, we develop a new heuristic that can help make sense of big data.
Easier said than done. Because our human mind is innately programmed to think and act otherwise.
When we see data, we constantly ask questions like ‘is the data right”, ‘why did this happen”, ‘what is the meaning of this analysis”. When we see analysis, we always respond with ‘can we see some more data”, “what if we analyse it differently”, ‘can we run some more analysis”.When we see options for decisions, we always ask, like Oliver Twist, for “More please”.
We trust our human ability, our personal experience, our own expertise built up over years. Now there is no doubt that expertise is valuable, and represents the simplest way we have of reaching a decision in a complex situation. However, asDaniel Kahneman brilliantly explains in his path-breaking book “Thinking Fast and Slow”, several decades of academic research to suggest that our brains are programmed such that people place too much confidence in human judgment.Kahneman talks about 4 common human biases – Availability, Anchoring, Substituion and Optimism and Loss Aversion.
This means that we will never let ourselves trust the data. Or the algorithm.
At this stage, the logical question is – why should we? And this is what Chris Anderson explains so well when he says “We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science (and the human mind) cannot…..” (words in parentheses mine).
It follows that we need to accept the next part of the argument – that correlation is indeed enough. The fact is… we have already accepted that in our daily lives.
Every time you use a search engine, every time you click an Amazon recommendation… we accept the theory that ‘since a million people who did X, did Y as well, it follows that since I did X, I would like to do Y’. Correlation of course, works only if you run it on really large numbers, but that is indeed the case with search, with Amazon’s recommendations or with big data. When the data is astronomically large, correlations do indeed provide a decent heuristic.
Supplementing this, however is that new developments in technology allow us to add techniques like probabilistic models, graph theory, Bayesian networks etc to correlation models to derive even better conclusions on what the data is really telling us.
If we agree that we stop asking why and that algorithms can find patterns we cannot, we still have to resolve the choice problem. What if the patterns throw out a million choices (analogous to the zillion results of every search we make on Google)? The problem is that in using a search engine of any sort, we still need to parse through the results to find the answers we need. And this is difficult when there are too many of them.
Instead, look at a system like IBM’s Watson, the Jeopardy champion beating next generation computing engine, now being applied to business problems around the world. Watson in Jeopardy, works with a method called DeepQA – a massively parallel probabilistic evidence-based architecture. Ok, that’s a mouthful. In simpler language, WATSON operates like an expert: it works to understand the question, and provides an exact answer or a smaller range of answers, with some level of confidence attached to each.
Now that’s useful.
This is where big data is headed. The ability to take massive amounts of data, more than any human mind can ever hope to process, to look for patterns that we can never hope to identify, and to provide a meaningful and smaller set of answers from which we can hope to make a choice we can be happy with.
That is the day the promise of big data will be fulfilled, the day big data will set our minds free.
A day where instead of spending hours navigating a vast array of numbers in an hopeless attempt to stay in control, we will use our infinitely more creative brains to ask deeper questions and find innovative solutions based on the analysis and choices provided to us by the machine.
That day will not come in a single leap, neither will it come with some trial and error. Neither will it come through incremental small steps. It will come instead in a series of large bounding leaps, much like a kangaroo crossing a large expanse of land effortlessly.
But for that day to arrive, we need to drop our human disbelief in data that contradicts our experience, our human over-confidence in our own judgements. And we need to believe that a data-driven future can indeed be simpler.
On the flip side, companies need to stop talking about the potential of big data and start delivering big data solutions that can algorithmically derive patterns from the data, and that can actually simplify the choices we face, while presenting the evidence we need to understand each choice.
[Note: All this of course, leads to further questions: Who programmes the machines and can we trust them, what if the data or the science is wrong, are we headed for a filter bubble, where every action we take leads to more of the same etc? More on that in a later post]
Yes, big data can set our minds free. It’s not far…. but we still have some work to do to get there.