Data Science is new, so new in fact that most people in the field who have experience are un-credentialed and have shifted from other fields to Data Science. So, it is natural to ask “What the heck is data science?” You have probably seen Venn Diagrams of statistics and math, overlapping with programming and overlapping with domain expertise. Now that tells a story of what data scientists do, but what is Data Science? Unsatisfied with most one-liner explanations I invented my own:
“Data Science is about leveraging the data assets of an organisation to help it achieve its strategic goals”
Now that’s something we can work with! This definition involves leaders in the process. OK, now we know what Data Science is, so where the heck are the Data Scientists?
Well, we partly have Data Science education to blame (mainly the MOOCs and online resources). You see, real Data Scientists know the whole analytic pipeline, from the idea to the specification, to all those tens or hundreds of decisions which need to be made for a data product to be developed and deployed into production. I feel a lot of the online resources tend to focus on “the icing on the cake” which is the cool modelling part after the bulk of the work has been done. So, plenty of people can fit a model to some data, but not everyone can help you through the entire process.
Core Skills for Data Scientists
I believe people tend to overcomplicate data science job requirements. Many people find it hard to get that first start because simply the expectations of the job are crazy. To put it in perspective as a guy with 10+ years in the field, I probably wouldn’t meet the expectations of half the grad jobs I see – and I have some serious game! But the reality is the skills of a data scientist, especially a new data scientist, are quite simple. So, let’s get real here with the credentials for new data scientists. When hiring a data scientist, what should you look for? What are the basic skills a data scientist needs?
To me, a good data scientist should:
- Know a bit about command line
- Know a bit about Git
- Know a bit about SQL, because even in 2018 most of the data you will see is in a database and you’ll access it via SQL
- Be able to read data into R or Python
- Know how to manipulate data in either R or Python
- Be able to do some plotting in R or Python
- Be able to fit basic models like GLMs in R or Python.
- Be able to document your modelling results ideally using reproducible research ideas in either R or Python
- Be able to talk about and present results. Most of this is practice.
These are the technical skills needed. Here are five less obvious (but equally as important) ones.
1. A Continuous Learner
Data science is an ever-changing field with new technologies and techniques every time you turn around. The only way you can possibly keep up is to always be learning. But there is no need to learn everything, you learn what you need to do your job in the most efficient manner and then learn what you will need for the next position that you want.
You can try to learn “all the things” but you will probably fail as “all the things” are forever changing. You can go crazy!
2. Is Passionate but not Necessarily Brilliant
I am un-credentialed, I don’t have a PhD but I am passionate as anything about data science. I have put in huge amounts of work in data science and programming over the last 10 years because I absolutely love it.
If you love what you do you can mix it with brilliant, credentialed people. Some of the best software developers I know never went to university, taught themselves and love it.
3. Has a Commercial Focus
The whole point of data science is to help the organisation to achieve its strategic goals through the use of its internal data assets. You need to understand business.
4. Does the Dirty Work
Possessing basic data manipulation skills. It is important to be across the entire process from raw data to model development.
5. Tries (and Fails)
Have a willingness to try (and fail). What success I have had in the field has come from a willingness to try new things, stuff I was really uncomfortable with, even things clearly outside my role like web development.
And One Type of Person Steer Clear of…
When hiring a data scientist one thing to avoid is what I call the “icing on the cake”, but it could also be interpreted as the “tip of the iceberg”. Most programs training data scientists including universities and bootcamps tend to focus on the modelling aspect. In reality, this is the final stage after hundreds of decisions and a whole heap of data munging, combining, hacking, cleaning, thinking about the sampling bias, thinking about the implementation plan. I mean it really is the icing on the cake. So if you hire someone who has focussed on the icing on the cake they generally aren’t the best resource, because they tend to struggle with the other 95% of the work.
I have had people with PhDs, who were brilliant but lacked what I’d consider basic data manipulation skills. So in that respect, I’d take a kid with some mad data hack skills over a PhD who has been marinated in academia for 20 years but hasn’t lived elbows-deep in data.
Start Simple and Build From There
What you need in terms of staff depends on how sophisticated your organisation is and where it is on its Data Science journey. So, you don’t need a bunch of PhDs when you are running everything manually on spreadsheets! The best way to eat an elephant is one bite at a time, so build up your sophistication and expertise as you need it. Look for people who have the skills that you require right now and hire new people with additional skills as the organisations data capabilities mature.
Hear more from Nic Ryan and other industry thought leaders at Business Analytics Tech Fest.
About the Author
Nic Ryan has worked in just about every role you can think of in a data science team, from excited newbie to quickly managing 3 teams across 2 countries. Now working as a one-man analytics team for a startup, a technical adviser to another startup and consultant to whoever knocks on the door Nic shares his thoughts on what makes data science teams work.