The Metaphysics of Big Data: Problem of Induction

Industry   |   
Published July 18, 2013   |   

How do we, human beings, acquire knowledge? Think for a sec! And don’t tell me that it is from text books, TV, internet or newspapers etc!!! Scholars say that there are mainly (debatably) five valid ways (Perception, Inference, Comparison, Verbal Testimony and Revelation) through which human beings attain knowledge and the most reliable and widely-accepted one, according to scientists and empiricists, is Sense Perception/Experience. We encounter the world and its objects directly through our five senses. When contacted, an object stimulates the sensory organs which transform the input energy into neural signals and transmit them to the brain, where the signals are processed and interpreted using the information already available in the database of the brain, resulting in a mental re-creation of the object. If the mental image corresponds with the idea of the object in the brain database, we recognize the object. If not, we don’t.

The lifetime process of knowledge acquisition by means of the experience of sensation and perception starts when a child is born. When it grows up, it slowly learns to co-relate simple information and build complex ideas like concepts and abstract ideas. It learns to use the facts, based on observations, to make complex judgements even about the things that are unseen and beyond human understanding… For instance, when an object is lifted in the air and then released, we know that object will necessarily fall to the ground. We try this experiment for multiple times and finally we come to a general/scientific conclusion that all objects will fall if released in the air. This method of deriving an unobvious, unobservable and new knowledge out of an observable fact is called Induction or Inductive reasoning. (Induction is a logical method through which we derive at a general conclusion about a class of objects based on some number of observations of particular instances of that class.) We observe swans and find that all swans we have seen are white, and therefore we conclude that all swans are white. Similarly, we see all men die. Then, we conclude all men are mortal.

Let’s take a contemporary example: Looking at every instance of information that you leave on your mobile phone while making a call, searching for a location on your map app or purchasing a product online, we can determine your habits. Looking at the records of over the past ten years, we can make reliable conclusions over customer behaviours, stock market fluctuations, weather predictions, elections results, crime probabilities… How is it possible? It is possible through an Inductive analysis in which a general conclusion is made by creating a necessary connection between different instances in the past. In other words, our minds has an analytical ability (inductive reasoning) to see the necessary connections (cause-effect relationship or patterns) between events in the world and assume/presuppose/predict that events in the future will necessary occur in the same way as we have experienced them in the past because that is the way we have experienced them in the past. It is called the Principle of Uniformity of Nature. Big Data is all about Induction and its method of analysis is undisputedly Inductive.

The problem of Induction

Is there any problem with Induction, the method used in Big Data and all scientific researches? It seems there is…!!!

Induction landed in trouble, when David Hume (1711-1776), one of the most influential empiricists and sceptic Philosopher in the history of Western philosophy, claimed that the connections between events are not as necessary as we might think. He argued that inductive reasoning does not afford us conclusive proof of causal connections in the world. What we experience, Hume explained, is nothing but a series of constant conjunction between events (relation of ideas). We make assumptions not on logical grounds, but out of mere habituation, because our reasoning contains the hidden premise that that events or properties that have occurred in the past will necessarily continue to happen or stay the same in the future. We presuppose that nature is uniform. So, challenging Induction, Hume argued that it is NOT necessary that whatever happened in the past will continue to happen in future. This most enduring problem in epistemology, induced by Hume, is called The Problem of Induction.

To explain Hume’s problem further, let’s take a couple of simple examples. From a series of observations that a woman walks her dog by the market at 8 AM on Monday, it seems valid to infer that next Monday she will do the same, or that, in general, the woman walks her dog by the market every Monday. That next Monday the woman walks by the market merely adds to the series of observations, it does not prove she will walk by the market every Monday. First of all, it is not certain, regardless of the number of observations that the woman always walks by the market at 8 AM on Monday.

Now, imagine a person who experiences the world for the first time! In the morning he sees the sun is rising at dawn and it is going down at dusk. Now, imagine explaining that the sun must rise every morning and set in the evening by telling him that the sun is rising and setting is an example of regularity. That man would not be convinced and would demand proof. He might ask what makes us so sure that things will not change the next day or even the next minute; that is, what faculty of the mind gives us the certainty of causality? How could we prove our claims? We could give examples such as that for millennia on earth, the sun always has risen in the morning and set in the evening, and gravity has always attracted bodies toward the ground.  We could also add that science can tell us precisely how gravity works. But neither past events, nor science can accurately predict how the future is going to be like. What scientists can do is to study past events and formulate hypotheses about the future. In fact, we might say that what we call knowledge is in reality a probability. Despite the great many observations we may have collected, we cannot know with certainty, or deduce, that the so-called laws of nature will remain constant in the future. Past experience—and not deductive reasoning—suggests to us that gravity will probably work the same way tomorrow. In short, induction leads us only to PROBABILITY, not to CERTANITY. Inductive reasoning helps us in decision-making, but the decision cannot be reliable; there is no guarantee its 100% truth.

With this, Hume questioned all empirical claims made in everyday life and scientific methods (majority of scientific research is based on inductive reasoning and scientists amplify particular observations to universal laws to make predictions about future behaviour.) Many philosophers (including Immanuel Kant, David Stove, Donald Williams, Karl Popper and Nelson Goodman) have attempted to solve this problem, but there is still no consensus on how to solve the issue, or whether it is actually solvable. That’s why great Philosopher Bertrand Russell once said:

“The great scandals in the philosophy of science ever since the time of Hume have been causality and induction. We all believe in both, but Hume made it appear that that our belief is a blind faith for which no rational ground can be assigned.” (Bertrand Russell, Let the people think.)

Hume’s problem of induction strikes at the very foundation of science (Science derives universal principles from a finite number of examples, by means of induction) and the way we think in our day to day lives. American philosopher Nelson Goodman opined that, this problem, as Hume demonstrated, is insoluble, and efforts to solve it are at best a waste of time. So, does it mean that induction is useless in deriving a knowledge? Big Data as a tool involving predictions, forecasting or behaviour largely depends on inductive reasoning to generate insights/predictions from the past data and thus, does it mean that what we all are doing in Big Data is in vain? What’s our take on this?

Now, consider there are two cricket teams A and B, playing a match. According to the past history, team A has won 60%, team B 30% of the total matches and 10% have tied. Who will win the match today? Here, we have three possible answers: A, B or a tie. The true answer can be A, as A have own 60% of the matches in the past. The answer is only likely to be true and it does not guarantee its 100% truth. This indicates that though the inductive reasoning helps us in decision-making, but this decision cannot be reliable. The team B may win the match of today which will slightly increase the percentage of winning of team B from 30% or the match may be a tie. It shows the inductive reasoning can neither be ‘accepted’ nor ‘rejected’ with confidence. However, though not certain/accurate, it allows us to look into all hypothesis about the future and help us narrow down the choices we have in terms of decision-making. So, instead of strictly rejecting or accepting, we can use inductive reasoning in a probable manner. Besides, though probable, data-driven decisions are more reliable and risk-free compared to the decision made out of pure intuition, right?

Declaimer: Hume’s inductive reasoning (in the context of Philosophy) should not be confused with Mathematical induction, as the former methodology deals with probability, while the latter is mathematically rigorous, meaning that the conclusions are logically certain. Both are different. The philosophical implication of the problem of Induction, raised by David Hume, does not apply to Mathematical Induction, a form of deductive reasoning used in mathematical logic and computer science.