Chapter 17
Senior Data Scientist Case Study

Senior data scientists are very difficult to reach because of the demands on their time. However, these are the people who have very useful insights about data science, and they are generally better equipped to offer actionable advice compared to other data scientists. In a way, they are the most mature professionals in the field and inhabit the role that most data scientists aspire to (including the author of this book).

In this chapter, we will look at a researcher type of data scientist from the Greater Atlanta area, Dr. Nikolaos Vasiloglou. We will examine his background, his views on data science in practice, how he sees the field evolving in the future and what tips he has for new data scientists (and aspiring data scientists). Finally, we’ll end with a summary of the main points from this particular case study.

17.1 Basic Professional Information and Background

Dr. Vasiloglou is a machine learning specialist, i.e., a data scientist who specializes in the machine learning aspect of the field. He works in the software development and mobile advertising industries. Although he has been working as a data scientist for about five years, he has been involved in the field much longer. His PhD was in scalable machine learning techniques, a topic that integrates seamlessly with data science.

Dr. Vasiloglou has been involved in several local groups related to the field, mainly through meetup.com. He was the founder of Machine Learning by Example, a group for students of machine learning (the group is no longer active), a member of Data Science Atlanta (the largest data science group in the state) and groups for Hadoop and programming languages. He also organizes the MLconf conferences, an industry-based type of conference, on machine learning.

He believes that there are two things on his resume that played an important role in jumpstarting his career in data science: internships in well-known companies such as Google and having a PhD in machine learning from a good university (Georgia Tech). For those who are unable to list either one of these credentials on their resume, he recommends getting the machine learning certificate from Stanford University (Prof. Ng’s physical class, not the MOOC on Coursera).

Dr. Vasiloglou is part of a 4-member team at one of the companies for which he works and on his own at the other company. In the team, he is responsible for all of its members and manages them by creating the architectural framework in which they work and by planning the projects in which they are involved.

Dr. Vasiloglou is a very professional individual who at the same time is very down-to-earth and approachable. He can be a fine role model for those who plan to make data science their life-long career.

17.2 Views on Data Science in Practice

This data scientist’s views on data science are based on his experience in the field and his research interests, which revolve around scalable machine learning techniques. His everyday work includes daily report monitoring (for jobs left to run overnight), brainstorming and mini group meetings, debugging problematic code, reading newsletters and conference proceedings and revising current problems (e.g., deep learning networks) to keep himself abreast of new technologies in the field.

According to Dr. Vasiloglou, a senior data scientist differs from the other grades of data scientist in two ways. First, a senior data scientist has more knowledge, know-how and more experience, which translates into more efficient work and a wider variety of potential techniques to employ when tackling a given problem. Second, a senior data scientist is capable of architecting a problem solution involving considerable work that may be divided among several people and of starting a new project (e.g., based on a conversation with a client and the data that he is given).

Examples of data products that he has developed (or participated in the development of) over the years include:

  • Botnets identification (finding infected machines based on network traffic data)
  • Library of machine learning methods that are fast and efficient
  • Forecasting model based on a traditional relational database

Although he has been practicing in the industry for the past few years, he values the role of researchers in the field and believes that a data scientist ought to be a bridge between academia and the industry, something that he seems to have accomplished very effectively based on what he says about his life as a data scientist. Since information theory is universal, he believes that he could transition to another industry relatively easily. He finds the sectors of drug discovery and forensics particularly interesting for a data scientist today.

17.3 Data Science in the Future

Dr. Vasiloglou acknowledges the possibility of data science becoming more automated—even completely automated. Still, he sees a lot of merit in having the state-of-the-art know-how as the field is constantly evolving and will no doubt continue to do so. He also expects more programming languages, particularly functional ones (e.g., Scala), to be very popular when it comes to data science in the years to come.

17.4 Advice to New Data Scientists

Dr. Vasiloglou believes in the importance of well-founded (solid) knowledge, so he advises newcomers to study mathematics (through books, papers, courses, etc.), especially younger people who are still in college/university. He also finds merit in competitions (e.g., those in Kaggle), which he recommends for people preparing to enter the field. Such competitions offer lots of useful experience with various types of datasets and give you a chance to put into practice a variety of the data analysis techniques you have learned. He also suggests that newcomers learn software development through OO and functional programming languages. He doesn’t favor any particular language because programming skills are highly transferrable.

Dr. Vasiloglou is a champion of equilibrium when it comes to developing your data science skills. Therefore, all of the above recommendations need to be taken into account and followed in an organic and holistic way so that you end up with a balanced skill set.

17.5 Key Points

  • Being a senior data scientist is somewhat different than being a typical data scientist because it entails more knowledge and know-how, more experience, the ability to architect a solution to a problem and the ability to start a new project.
  • In order to get a senior data scientist position, having an internship in a major company or obtaining a PhD from a good university are important. However, if you don’t have either one of these credentials on your resume, you can opt for a certificate in machine learning from Stanford University (Prof. Ng’s classroom course).
  • Drug discovery and forensics are interesting industries where data science can prove to be very useful.
  • Transitioning to another industry is relatively easy because information theory is universal.
  • In order to become a data scientist, you need to develop the following in a balanced way:
    • Well-founded (solid) knowledge of mathematics
    • Experience through competitions (e.g., Kaggle)
    • Software development in OOP and functional languages
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset