Recently I had a great email question from Henrique Cheng, one of the graduate students in EXST at LSU asking about what is an appropriate path to being a Big Data/ Data Scientist and whether or not it passes through abstract mathematics and specifically my background in Topology. His question stemmed from a research article he found that talked about Topological Data Analysis, a field that I am well versed in. This blog is my answer to him, coupled with some ideas for TDA and general advise based on my own experiences.
Big Data/Data Science is a very “fluid” field combining ideas from Computer Science, Statistics and Mathematics. To be able to get to this area, there are multiple venues like the ones described below:
a) You can be a good statistician, with a knack for computer algorithm development. Perhaps a CS bachelors and a Stats masters.
b) You can be a mathematician with applied tendencies and some good coding skills. Math bachelors, CS masters.
c) You could even be just good at statistics, perhaps since you minored in them during a masters in another field and then learned a few tricks in a specific coding language (stats minor and any other masters program).
There is no clear path, and the list above is definitely not exhaustive. Mine, going through topology and all is somewhat of a special case so definitely not advisable for everybody.
I obtained problem solving skills while doing my PhD in theoretical mathematics but nothing formal in terms of data handling. That happened during my post doc at NCSU where I tried to learn as much as I could from Statistics and Computer Science on my own pace. Basically I tried to get a good handle on all the classes offered in a typical Statistics master’s program and then basic coding and advanced databases using Matlab and a bit of R. I was also fortunate to work for the Lab of Analytical Sciences where we basically learned by doing (intense workshops of a couple of days on a specific thing).
This is thus advise I want to give to people aspiring to get involved with data analytics:
a) Finish a masters in Statistics.
b) Learn as much as you can about coding and databases (either on your own or take relevant classes from CS)
c) Be comfortable with learning new things (here is where math is EXTREMELY useful) since the field changes every few years and you need to stay current.
d) Do a paper/project or something similar in a deeply applied manner. Hopefully get a publication out of it, but definitely focus on a real life application.
e) Go to talks/conferences/seminars where these ideas are discussed.
Now, turning to Topological Data Analysis specifically, I must say that the field is fascinating, from both the theoretical and the applications point of view. I enjoyed immensely my time working with TDA ideas but I soon realized that the applications of it are limited to certain types of data and certain very specific questions. Of course there are areas that TDA is the best technique for the job, surpassing every other one, but it is definitely not a “one tool to rule them all”.
Even if you want to become an expert in TDA you first need to now DA, therefore the path I describe above is definitely helpful. As a final remark, I would like to caution aspiring Data Analysts on two things:
a) Don’t get tied down to a software. You are not an R analyst, Python analyst, SAS analyst or Matlab analyst. YOU ARE JUST AN ANALYST! Understand coding, don’t memorize syntax.
b) Read/ learn/ change approach all the time. Data Analysis changes weekly. If you stay rigid to one sort of technique you will become obsolete within a few years.