Statistics Denial, Best Statistical Practice

Analytics   |   
Published July 15, 2015   |   

This is the fourth instalment in the series of essays on Statistics Denial by Randy Bartlett, Ph.D. To read other articles in the series, click here.
“I keep saying that the sexy job in the next 10 years will be statisticians”
— Hal Varian
“I keep saying that analytics today is like the wild, wild west: anyone can say they are an analyst and there’s no reason to disbelieve them. When they say they’re an analytics professional with a CAP, then you know they’ve achieved the industry standard for practice.”
— Louise Wehrle, Certification Manager, INFORMS
For more than a century, applied statisticians/quants have refined their accumulated wisdom in extracting information from numbers with uncertainty and leveraging it to make smarter decisions.  We are calling this Best Statistical Practice.
Best Statistical Practice:
Best Statistical Practice, as characterized by Deming, et al., requires mastering business knowledge and solving the data analysis within the broader considerations of the business analytics problem: Timeliness, Client Expectation, Accuracy, Reliability, and Cost.
There are three natural pillars we can leverage to improve statistical practice within an organization: Statistical Qualifications, Statistical Diagnostics, and Statistical Review (see Chapters 7-9, in my book ‘A Practitioner’s Guide To Business Analytics’).  Aggressively applying statistics means leveraging the best Qualifications; using effective Diagnostics to measure results; and Reviewing everything that could go wrong.
When we can express our business need as a mathematics problem, then we can deduce a unique answer.  Statistics problems, however, have an additional solution layer derived from uncertainty with the numbers.  Hence, we need statistics assumptions and to take a corresponding level of precautions. (We provide a more thorough problem-based clarification of statistics in the May/June 2015 issue of Analytics Magazine, http://goo.gl/Wod3gk.)
Acquiring this accumulated wisdom comes from working with other quants.  By themselves, statistics books and the internet can only prepare the hobbyist; or only augment the learning process for the professional in the field.  Just like we would strongly prefer a highly specialized team of professionals to perform our heart transplant, we need the same level of professionalism to perform important data analysis.
Circumventing Statistical Qualifications
Here, I will discuss the harm that comes from circumventing just one pillar of best practice, Statistical Qualifications.  The absence of the proper training and experience leads to sloppy data analysis, a much narrower breadth of practice, and what we will call, ‘data hogging.’
A number of forces are pushing statistics expertise out of data analysis.  First, promotional hype is advising employers to look for generalists or ‘unicorns,’ who are great at everything.  This is a fool’s errand.  It tries to find someone expert in both statistics and IT.  This objective might make sense for very small companies, but if you need advanced data analysis, it does not make sense to replace The Beatles with four one-man bands.  Such an approach leads to schizophrenic job descriptions; to a sacrifice in statistical prowess; and, unfortunately, to a healthy environment for hucksterism.
Second, the straddling terms “machine learning,” “data mining,” and “data science” are being used to repackage statistics/data analysis with IT/data management as if there is some technical synergy between the two and next up, as if data analysis is merely a subfield of data management.  Such repackagings provide a way to repurpose qualifications in IT as qualifications in statistics.  It is a leap to regard all data scientists as competent in data analysis, too.  We need accreditation specific to data analysis, rather than encompassing several distinct skills.  Also, splitting these straddling terms would improve communication, Statistical ML, Statistical DM, and Statistical DS.
Third, it is relatively easy to set up shop as a data scientist.  That is part of the term’s popularity.  Again, we need accreditation specific for data analysis to help consumers of advanced data analysis discern legitimate qualifications.
Fourth, claims that software is ready to make everyone a statistician can, if taken on its face, push statistical expertise out of data analysis.  Statistical software is very good at automating tasks for statisticians, yet it is not ready to replace judgment or creative thinking.  For many technical advances, such as replacing human judgment, the achievements are proclaimed before they are achieved.
Fifth, if these factors discourage specialization in data analysis, then we might be headed for another round of ‘data hogging’ (not seen since the days before Y2K).  If just anyone can perform data analysis, then there is no need to share the data, or review the analysis, for that matter … and we are heading for an Orwellian order form.  Just fill this it out; one size fits all.
Close:
We want new ideas from the ‘information rush.’  However, there is a great risk from promotional hype that is extreme enough to adulterate statistics and circumvent Best Statistical Practice, which is our accumulated experience for extracting signal from noise, i.e., information from data.  In particular, we need to raise our level of practice by embracing Statistical Qualifications, Statistical Diagnostics, and Statistical Review.
We sure could use Deming, right now.  Many of us, who consume or produce data analysis, hang out in the new LinkedIn group: About Data Analysis.  Come see us.