Demystifying Machine Learning Part 1 – Data, processing power, and open source

Machine Learning   |   
Published February 9, 2016   |   

This is the first of a three-post series on machine learning. 

Is machine learning just a fancy word for the same old computer programming we have employed for decades? Or, is machine learning a mystical computer that can learn anything? More importantly, why does it matter to your business?
The most effective way to define machine learning is to compare it with traditional computer programming. In traditional computer programming, one writes specific instructions for the computer to process the input it is provided and produce an output. For example, the input can be an application for a credit card, the computer program is an instruction to process this application, extract the useful pieces of information, compare it with other data and produce an output, which in this case would be a recommendation to accept or reject the credit card application.

In contrast, a machine learning program does not have a specific instruction set on which credit card applications to accept or reject, but instead would learn from the input data it has been provided with and progressively improve its performance automatically through experience. Machine learning—a subset of artificial intelligence—improves its performance by analyzing massive amounts of data. It can ‘tweak’ its parameters to fit the new data it receives; progressively improving its performance. The simplest form of machine learning is linear regression where the parameters are adjusted to fit a linear equation that ‘best’ explains the observed data.

There are some tasks that are more easily ‘programmed’ through conventional programming, while some others are more amenable to machine learning. Humans do certain things involuntarily that we cannot articulate in words. For example, I can tell you how I tie my shoe, but I cannot explain why I recognize someone’s face. I just do. As a result, it is somewhat harder to write a traditional computer program to identify a face. However, I can blast the computer with thousands of images of faces along with a picture of the face I want it to recognize until it can decipher the difference. Facial recognition is born.

Machine learning is possible and popular now for a number of reasons:

Volume of Data: More than ever before we have access to vast amounts of data – not just structured data, but also unstructured text data, audio, images, and video. We can use this data to build systems that can learn from data.

Processing Power: Accelerating computing technology, including massively parallel GPUs (Graphics Processing Units) and cloud computing have made it cheaper and faster to process the large volumes of data.

Open Source Software: Open source groups focused on developing machine learning programs as well as the availability of large open source data for learning is accelerating the development of machine learning. For example, one can quickly use open source machine learning packages to process large volumes of image data to recognize specific images.

Machine learning carries enormous potential for the creation of meaningful products and services. Some examples of machine learning at work include:

Hospitals creating a library of scanned images to detect and diagnose cancer
Insurance companies digitally and automatically recognizing and assessing car damage
Security companies trading clunky typed passwords for voice recognition
Government agencies predicting weather patterns

Machine learning isn’t magic, but it has the capacity to help companies develop powerful revenue generating products and solutions. It opens yet another way in which we can understand data and make useful recommendations without necessarily knowing exactly how humans solve problems and make recommendations. It’s up to Chief Data Scientists to experiment with machine learning prototypes and present possibilities to their C-Suite peers for developing breakthrough innovation.

Have you developed a machine learning algorithm to make a useful recommendation? Do you think there are classes of problems in your industry that are more amenable to this type of machine learning approach?

Check out: Demystifying machine learning part 2: Supervised, Unsupervised, and Reinforcement Learning