A data scientist’s story: Research into 1,001 data scientist profiles

Published December 11, 2017 |

arvindl

In the age of big data and machine learning, there is one profession that stands out – data scientist. Data scientists are the investment bankers of the 21^st century. The prestige associated with the title is undoubtedly sought after by most quant inclined candidates.

But how does one make the data science dream into reality, and become a data scientist?

Different answers exist, and each data scientist has their own story. But a single story is not worth much. This is why 365 Data Science conducted a study where they aggregated data on 1,001 data scientists’ LinkedIn profiles.

The goal was simple. What is the “average data scientist” story?

Methodology

The sample was acquired from 1,001 data scientist profiles on LinkedIn. Convenience sampling was used due to the limited accessibility of data. To limit bias, country and company quotas were assigned. Geographically, there were four groups: US (40%), UK (30%), India (15%), and other countries (15%). In addition, roughly half of the sample consisted of professionals employed by a F500 company.

Summary of findings

The “average data scientist” turned out to be a male (70%), who speaks at least 2 languages, holds a second-cycle academic degree, with 27% PhDs and 48% Master’s. On average (median) it takes a data scientist 4.5 years to earn the title. The data science tools he uses vary with the exception of R and Python. The two programming languages were almost equally employed (53% each), with 74% of the cohort relying on at least one of them.

datascience-research1

Programming languages

Apart from R and Python, there are certainly other tools utilized, although these are predominantly programming languages. The third top skill is SQL (40%). Unsurprisingly, MATLAB, Java, and C/C++ are losing ground. These findings are rather confirmatory as this trend has been continuously observed in the last few years as indicated by previous research.

datascience-research2

Programming language by country

But are these findings universal across the world? The only way to answer this question is through a geographical segmentation. Python is the leading language in both the US and the UK, while R is king in India and the other countries in the sample. Still, the difference between the two preferred languages is not striking. Java is declining in all three major clusters (US, UK, and India). However, “other countries” seem to still rely on the older languages: Java and C/C++.

It is noteworthy, however, that data scientists in India lead in C/C++ adoption (23%). That’s in line with the ‘IT outsourcing’ reputation of India.

datascience-research3

This insight naturally leads one to think about the work experience of data scientists.

Previous experience

It is interesting to outline the data scientist path from graduation to the top of the data science ladder. A great portion of the cohort worked as a data scientist at their previous position (36%). With that information in mind, the most commonly observed ways to become a data scientist is through the data analyst position (17%) and Academia (12%). Given that 27% of the cohort holds a PhD, it’s no surprise that academia is a leading “producer” of data scientists. Comparing this information with the data from 2 jobs ago, we can highlight Intern, IT, and consultant as the other big clusters.

datascience-research4

Given those main clusters, one can’t help but be curious about the education of the “average data scientist”.

Educational background

Truth is, there isn’t a single degree that’s dominant when it comes to data scientists’ educational backgrounds. What they all do have in common, however, is that they come from mostly quant degrees. If you studied anything that touched upon programming and computer science, or dug deeper into math and statistics (whether as a degree on its own or as part of your degree), then you are at an equal opportunity to get your foot into the door with anyone who has studied specifically Data Science. The research showed that 20% of data scientists have a degree in computer science, 19% have a statistics or math background, and 19% majored in economics and social sciences. Only 13% have a degree purely focused on data science and analysis, which can be largely explained by the fact that this only quite recently became a degree in its own right. It bears pointing out that machine learning majors were included in the data science cluster, instead of the computer science one.

datascience-research5

University

Given the heterogeneity of degrees, it seems logical to look for a pattern in the universities data scientists graduated from. All universities in the sample were ranked according to the ‘Times Higher Education’ world university ranking.

datascience-research6

*Apply diligence when interpreting the chart. The groups are taken from the original ranking and are not equal*

The data shows 28% of the cohort graduated from a top 50 university. Interestingly enough, a comparable portion of data scientists (25%) came from institutions that were not in the 1100 universities, ranked by the Times. This insight shows that not only the degrees are heterogeneous but the university rankings, too.

A plausible explanation could be found in self-preparation.

Self-preparation

Forty percent of the cohort reported to have taken an online course. Further, there were 3.33 certificates per LinkedIn profile. Therefore, data scientists undoubtedly rely on self-preparation. While these data are least rigorously examined in the original research, this estimate (40%) is actually conservative. It seems logical that many professional at the top of the data science ladder would not report introductory courses on their area of expertise.

datascience-research7

Conclusion

The “average data scientist” story is not an average one. It’s filled with math, programming, and constant innovation. The take-away is that a quant mind, an aspiration for self-improvement, and a strong focus are the main drivers of the career success of contemporary data scientists.

This article originally appeared here. Republished with permission. Submit your copyright complaints here.