Why becoming a data scientist is NOT actually easier than you think

Published October 24, 2014   |   
Joseph Misiti

I was just doing some late night reading and came across this article. TL;DR – You can take the ML course on Coursera and you’re magically a data scientist, because three really intelligent people did it. I disagree.

I’m not claiming the people referenced in this article are not data scientists who score high in Kaggle competitions. They’re probably really intelligent people who picked up a new skill and excelled at it (although one was already an actuary, so he is basically doing machine learning in some form already).
Here is my problem with it – being a data scientist usually requires a much larger skill set than a basic understanding of a few learning algorithms. I’m taking the Coursera ML course right now, and I think it is great! Here is what I didnt learn though:

Programming Languages and Other Technologies:

Most data scientists and the companies that employ them are not using Matlab/Octave. They have backend web services written in Java, Python, Scala, or Ruby. These languages are not covered. Python has libraries like Scipy, Numpy, and Scikit-learn that are great for solving numerical problems. Java has a bunch of libraries too like the Mahout math library [2]. R is used by most statisticians (again not covered in the course). When your boss (or a customer) comes to you and says you need to integrate an algorithm into a pre-existing web service ( example -they need a recommendation engine), and you say “I only know Matlab” that is going be a huge problem. You don’t just pick up Java/Python/C++/Scala/whatever in a few days on the job.

Read More