To become a data scientist, focus on coding

advice
Author

Rachel Thomas

Published

March 23, 2017

This week’s Ask-A-Data-Scientist column answers two short questions from students. Please email your data science related quandaries to mailto:[email protected]. Note that questions are edited for clarity and brevity. Previous posts include:

Q1: I have a BS and MS in aerospace engineering and have been accepted to a data science bootcamp for this summer. I have been spending 15 hours/week on MIT’s 6.041 edx.org probability course, which is the hardest math course I’ve ever taken. I feel like my time could be better spent elsewhere. What about teaching myself the concepts as needed on the job? Or maybe you could recommend certain areas of probability to focus on? I’d like to tackle a personal project (either dealing with fitness tracker data or bitcoin) and maybe put probability on the backburner for a bit.

A: It sounds like you already know the answer to this one: yes! your time could be better spent elsewhere.

Let your coding projects motivate what you do, and learn math on an as needed basis. There are 3 reasons this is a good approach:

The only exceptions: if you want to be a math professor or work at a think tank (for most of my math phd, my goal was to become a math professor, so I see the appeal, but I was also totally unaware at the time of the breadth of awesome and exciting jobs that use math). And sometimes you need to brush up on math for white-boarding interviews.

Q2: I am currently pursuing a Master’s degree in Data Science. I am not that advanced in programming and new to most of the concepts of machine learning & statistics. Data science is such a vast field so most of my friends advise me to concentrate on a specific branch. Right now I am trying everything and becoming a jack in all and ace at none. How can I approach this to find a specialty?

A: There is nothing wrong with being a jack of all trades in data science; in some ways, that is what it means to be a data scientist. As long as you are spending the vast majority of your time writing code for practical projects, you are on the right track.

My top priorities of things to focus on for aspiring data scientists:

In terms of tips, there are a few things you can skip since they aren’t widely used in practice, such as support vector machines/kernel methods, Bayesian methods, and theoretical math (unless it’s explicitly necessary for a practical project you are working on).

Note that this answer is geared towards data scientists and not data engineers. Data engineers put algorithms into production and have a different set of skills, such as Spark and HDFS.