A walk through of Data Scientists' challenges
Being a "Data Scientist" is widely considered to be the best job in the tech, and related industries. It even received the title "sexiest job of the 21st century" according to Harvard Business Review, and is listed as the #1 job under the "50 best jobs in America for 2019" on Glassdoor. The salary, the growing importance within the companies, and many other aspects make being a data scientist attractive. Still, like anything else in the world, nothing is perfect, and there are some less fun parts of the job. This post is aimed at covering the main issues which Data Scientists need to overcome during their workday.
When asked regarding his statement in The Data Science Handbook, John Foreman, VP of Product at MailChimp, refers to the issue that there is a skewed view of what data science is. He explains that there are wrong perceptions among those who are less familiar with the field. People might think the main job that data scientists do is building predictive models. However, before building a model, data scientists need to know what data sources are available within the company, what techniques are available and what technologies are available. They need to define the problem appropriately and engineer the features accordingly. Usually, when grabbing data from Kaggle, all of these steps are already done. There are other fundamentals of operating in a data science role at a company, which need to have more of a focus.
“Data as oil”, is a metaphor you’ve probably heard before. Data needs processing, just as oil needs a lot of expense and attention to refine it before its true value is revealed. So, what are the process stages that involved in transforming raw data into insight, turning your data into fuel for the company?
In most cases, it's necessary to deal with unstructured data, which lacks any content structure at all. Moreover, it is estimated that Data Scientists spend 80% of their time collecting, cleaning, and preparing data for use in machine learning. The remaining 20%, they spend mining or modeling data by using machine learning algorithms. Although it’s the least delightful part of the process, this data engineering is very important and can affect the performance.Data engineering roughly consists of three main parts: wrangling, cleansing, and preparation the data:
“If you do not know how to ask the right question, you discover nothing."— W. Edward Deming
Considering we've figured out the data processing part, and we have clean, consistent and reliable data, given the available data, identified the business problem in a given business situation and converted the business question into an apt data science problem, it is not that easy either. There are many reasons that problem definition can be hard. It is sometimes due to stakeholders who don’t know what they want and expect data scientists to solve all their data problems. Doing data science without domain context of the business question or without collaborating with the problem owners is a challenging issue in order to deliver business value for the enterprise.
Generally speaking, a Data Science project has the goal of improving some existing business process. Turns out it’s really difficult to change a business process. Operationalizing the insights obtained from the model results depends on several factors. Data Scientists often hand over insights to other professionals to make business decisions, so that explaining the data in layman's terms is considered to be one of the top qualifications a good Data Scientist should have. Presenting high performance data, as well as results, to stakeholders isn't enough. Data Scientists must communicate effectively; they should deliver the results and their insights through compelling data storytelling.
As mentioned, it's not nearly hard to build a great learning algorithm you also need to receive access to timely, accurate, consistent and relevant data as well as have clear goals and priorities outlined from the stakeholders, all while convincing them that taking the leap of (data) faith. These aspects are some of the Data Scientists' challenges, which should be emphasized more.
Author Note: The following articles served as inspiration and sources for writing this post
Rohrer. (2017, September 21). I Asked Data Scientists What the Hardest Thing About Data Science is. Their Answer Might Surprise You. Retrieved from: https://www.linkedin.com/pulse/i-asked-data-scientists-what-hardest-thing-science-answer-rohrer/
Editorial Team. (2019, February 23). Infographic: The Typical Data Scientist 2019. Retrieved from: https://insidebigdata.com/2019/02/23/the-typical-data-scientist-2019/
Jones. (2018, February 1). Data, structure, and the data science pipeline. Retrieved from: https://developer.ibm.com/articles/ba-intro-data-science-1/
Seroussi. (2015, November 23). THE HARDEST PARTS OF DATA SCIENCE. Retrieved from:https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/