Data Scientist without the job title "Data Scientist". Does this sound crazy or sensible?
I have been reading quite a bit of literature about data science, and Data Scientist for that matter. I realize that there is an obvious disconnect between job titles and the jobs some people actually do on a day to day basis, which would otherwise qualify them to be called Data Scientists.
So, who is a Data Scientist for starters? As is the case with most of these technical definitions, you will not find one common answer to such a question. In fact, to make it even a little easier, one would ask: what is data science? Frankly, you will just get lost in the myriad of definitions advanced from various circles, especially if you venture into YouTube videos. Its all very confusing, but the one which strikes me most is that data science is about getting actionable insights into data - which sounds really reasonable. Therefore, for the rest of this post, we shall assume that this definition is true, even if only marginally.
So, what does getting actionable insights into data involve? What is data in the first place, and where is it located? In what form is the data originally? What must be done to the data to expose the insights? These are questions with fairly obvious answers that may not deserve more scrutiny in this post. But it would help to appreciate that in real life, we have structured and unstructured data from disparate sources all around us, and they come with data quality issues (or errors for simplicity). When confronted with this, we should immediately appreciate that extracting and manipulating this data to a form that allows any sense or insight to be made out of it should not be taken for granted. This begs the question: who does all the donkey work of extracting, manipulating, wrangling, exploring, visualizing - even before we talk of modeling?
For a moment, think about the common work environment, where "economy" or "profits" rule over everything else. Someone who does the extraction, manipulation, wrangling and exploration, do they have the extra time or even skills to make any further sense of the data given that they are expected to do so anyway? Are there even other "Data Scientists" to collaborate along with in the data pipeline prior to modeling?
This brings me to the serious question: who is this special person called Data Scientist who knows SQL, Python, R thoroughly well? And wait a minute for the software engineering skills, data visualization skills, data engineering skills, big data frameworks such as Hadoop, .... moreover, we cannot underestimate the domain expertise. And most of all this person should know a lot of math and statistics? Wouldn't it be more realistic to expect such a mortal human being to know a bit of all those skill set? Where do we draw the line?
I think there is better segregation of duty in industry, especially the big tech giants, where a Data Scientist is a Data Scientist, concentrating on modeling. In many of the resource-constrained settings where the Data Scientist is also a Data Engineer, Software Engineer, BI Engineer, and all, it just doesn't not seem to work out easily. However, it is still imperative that the Data Scientist knows a little bit of everything else.