The term data scientist has been used lately to describe a wide variety of skills & roles. In this post I will focus on a particular flavor of data scientist. I will talk about the qualities needed to be a good data scientist-engineer who ships relevance products to users. Some examples of relevance products are:
1. Search engines like Google, Bing, Foursquare, Yelp.
2. Recommendation systems like Netflix movie recommendations, Amazon “what to buy” recommendations or Twitter “who to follow”.
3. Smart news feeds like Facebook or LinkedIn
These folks need to be strong at data science and engineering to be successful. In fact, relevance engineer might actually be a better term to describe these data scientists*.
Relevance engineers have a common set of skills that they draw upon to get their jobs done. The list below doesn’t include some of the known, obvious skills. You obviously need to be smart. You obviously need to have (or be able to learn quickly) the required “book” knowledge.
But beyond that, there are a bunch of not-so-obvious skills that you can’t learn from a book. Here are some of those, in no particular order:
1. You need to enjoy an iterative process of development. If you want to build a relevance-based software feature**, you need to be able to build a version 0.1 using a very simple model quickly. Then iterate on getting it better at every successive stage.
2. You also need to have a good intuition for when to stop. By definition, relevance features are never done. You can always improve the accuracy a little more. But at some point, the effort you put in exceeds the value you derive from it. You need to be able to identify that point.
3. You should be comfortable with failure. A lot of your models & experiments will fail. And that’s ok.
4. You should be driven by curiosity. The best people are the ones who are genuinely curious about the world around them.
5. You need to have a good data intuition. You should be good at identifying patterns in the data. Being able to create quick data visualizations (using R, Python, Matlab or Excel etc.) helps.