What profession did Harvard call the Sexiest Job of the 21st Century? That’s right… the data scientist.
Ah yes, the ever mysterious data scientist. So what exactly is the data scientist’s secret sauce, and what does this “sexy” person actually do at work every day?
This article provides a data science definition and discussion meant to help define the data scientist role and its purpose, as well as typical skills, qualifications, education, experience, and responsibilities. This definition is somewhat loose since there really isn’t a standardized definition of the data scientist role, and given that the ideal experience and skill set is relatively rare to find in one individual.
This definition can be further confused by the fact that there are other roles sometimes thought of as the same, but are often quite different. Some of these include data analyst, data engineer, and so on. More on that later.
Here is a diagram showing some of the common disciplines that a data scientist may draw upon. A data scientist’s level of experience and knowledge in each, often varies along a scale ranging from beginner, to proficient, and to expert, in the ideal case.
While these, and other disciplines and areas of expertise (not shown here), are all characteristics of the data scientist role, I like to think of a data scientist’s foundation as being based on four pillars. Other more specific areas of expertise can be derived from these pillars.
Let’s discuss them now.
The Pillars of Data Science Expertise
While data scientists often come from many different educational and work experience backgrounds, most should be strong in, or in an ideal case be experts in four fundamental areas. In no particular order of priority or importance, these are:
- Business/Domain
- Mathematics (includes statistics and probability)
- Computer science (e.g., software/data architecture and engineering)
- Communication (both written and verbal)
There are other skills and expertise that are highly desirable as well, but these are the primary four in my opinion. These will be referred to as the data scientist pillars for the rest of this article.
In reality, people are often strong in one or two of these pillars, but usually not equally strong in all four. If you do happen to meet a data scientist that is truly an expert in all, then you’ve essentially found yourself a unicorn.
Based on these pillars, my data scientist definition is a person who should be able to leverage existing data sources, and create new ones as needed in order to extract meaningful information and actionable insights. A data scientist does this through business domain expertise, effective communication and results interpretation, and utilization of any and all relevant statistical techniques, programming languages, software packages and libraries, and data infrastructure. The insights that data scientists uncover should be used to drive business decisions and take actions intended to achieve business goals.