Learning new things is an integral part of being a data scientist, especially when new innovations are fairly common in the field. However, if you are new to data science, you are bound to have a lot of gaps in your knowledge, so learning the missing material is essential. Of course, if you want to evolve professionally, this is something you would do in any profession. However, in data science there are new programs coming out all the time, so even if you were the best data scientist in the world right now, your skills would be bound to be somewhat obsolete in a few years if you decided not to keep abreast of the developments in the field.
Tackling problems is similar to learning new things in that it requires the same flexibility and mental agility. Although this is common with many IT-related professions, in data science problems are a bit more commonplace, mainly because it’s an interdisciplinary field. However, by tackling the problems that arise with a positive attitude and a creative approach, you’ll also learn more new things than you’d normally be able to learn otherwise.
In this chapter we’ll examine various ways that you can upgrade your knowledge and, more importantly, your skill-set right now as well as while on the job. In the first four subchapters you’ll find out about how you can learn from workshops, conferences, online courses (often referred to as MOOCs) and data science groups. In the later subchapters, you’ll learn about the various problems that may arise in your work as a data scientist: namely, resource issues, requirements issues, insufficient know-how for a task you undertake, and integration issues.
9.1 Workshops
Workshops are the most efficient way to learn something new, especially when it comes to technical know-how. Fortunately, due to the increased popularity of the data science field there are numerous workshops available from which to learn any aspect of the field.
Workshops tend to be somewhat expensive (several hundred dollars each) but they are a good investment, especially if you are good at picking up new knowledge and know-how. Free alternatives for learning new things will be covered in subchapter 9.3. How to find the best workshops will be discussed later in this section.
So why bother with workshops if there are other ways to learn new things? Well, workshops provide networking opportunities, can enhance your resume (if you have no other data science related qualifications), and often provide more useful knowledge and know-how than university courses, regardless of the university. This is because university courses are often based on the available literature in scientific books, journal papers and conference proceedings and are designed to give students the foundation on which to build more advanced knowledge.
Workshops are also very time efficient, squeezing into a few hours material that would normally take days to learn on your own. They are often hard and demand all of your concentration, but they enable you to learn something you would normally not have the time or resources to learn on your own.
The key things to keep in mind when choosing to register for a workshop are what you are going to learn and how it can be useful for your job as a data scientist. This sounds obvious, but it is really easy to get sold on workshops that you don’t need since they all appear quite appealing at the sites that promote them.
To ensure that you stay focused on the appropriate workshops, make a list of the skills and knowledge that you want or need, then research workshops that are being offered. Update your list if you find workshops that offer something you haven’t thought of; if there are several workshops that offer it, it is usually something useful to know in the industry. Finally, pick the workshop that is most suitable for what you want or need, taking into account its location, the time of the year it’s offered and, of course, its price. You can’t go wrong with a strategy like that.
9.2 Conferences
Conferences are like workshops but are designed for larger groups of people. They offer some innovative pieces of knowledge based on research and case studies as well as more foundational information for those who are newer to the subject of the conference. More often than not, conferences offer workshops to attract more people. Note that in this book we are referring to non-academic conferences, since the academic ones have a different mission and scope.
Conferences are a great way to learn a variety of new things in a short period of time, meet new people, exchange war stories and get acquainted with other challenges in the field. Conferences are quite interactive and provide great mental stimulation, very similar to some good university classes, but without the stress of exams and written assignments. They are usually costly, making them a viable option mainly for full-time professionals. However, given the benefits they can provide, they are a worthy alternative for anyone interested in expanding his skill-set and data science knowledge. Fortunately, companies often cover at least some (if not all) of the expenses of their employees who are participating in such conferences.
The big advantage of this option for learning new things is that it is very time efficient, especially when combined with a couple of workshops. If you can relate this new knowledge to an existing problem you are facing, that’s even better. The bottom line is that if you are open to new things, a conference can prove to be a very fruitful experience that may enrich your understanding of data science and your particular role, too. You can find out about the various conferences that are being offered by searching the web directly or through the various data science groups (see subchapter 9.4).
9.3 Online Courses
Although the world today has a lot of issues, it’s also the first time in our history that refined knowledge23 on a large variety of subjects is publically available at no cost. This is through the various online courses, particularly MOOCs24.
The first MOOCs appeared in 2008 and have grown in popularity and in variety since then. The largest MOOC provider, Coursera, is an initiative of two faculty members of Stanford University, Prof. Daphne Koller and Prof. Andrew Ng. The courses on this site span from calculus to philosophy to history of art. Since one of the founders, Prof. Ng, is a leading machine learning expert, there are several worthwhile courses on data science (Prof. Ng’s course “Machine Learning” is one of the best MOOCs out there, not just within the Coursera site). Coursera’s website (www.coursera.org) is user-friendly and straightforward, and so are its applications for smartphones and tablets to facilitate the use of the site’s content while you’re on the move.
There are several other places where you can find MOOCs, the most well-known of which are:
All the above alternatives are great, but it’s good to keep in mind that none of them come anywhere close to the Coursera site in terms of quality and popularity (a typical data science course at Coursera has 50000-100000 students enrolled). In addition, the courses of Coursera are quite interactive, and if you commit to them, they can be a very enjoyable experience. However, if you can’t find the course you are looking for on that particular site, it is worth taking a look at the alternative MOOC providers to supplement your learning.
The (Coursera) MOOCs on data science that are definitely worth looking into are:
Note that as more and more universities develop MOOCs, there may be new data science courses that are not on this list. So keep your eyes open and ask around. Oftentimes, the Coursera forums are a great place to get informed about courses similar to the ones you are taking, plus you can get some useful feedback on how good they are from classmates of yours who have taken them. A great place to get additional evaluations of the various courses is Coursetalk (coursetalk.org), so check this out too before enrolling for a course to make the most of your time. Finally, lately Coursera offers specializations, which are basically amalgamations of courses from a university with an exam or project at the end and a specialized certificate if you pass all the classes. Not all specializations are free. Currently there is one specialization for data science, offered by the Johns Hopkins University.
9.4 Data Science Groups
One of the most enjoyable ways to learn, especially if you like socializing and networking, data science groups are popping up all over the place. If you live in a large city, the chances are that you will find one in your area. Data science groups are a great place to network and make acquaintances that may lead to job opportunities. You can read more about that in Chapter 13.
Since data science is a buzz word, some data science groups use the name in order to get a lot of people involved without living up to their promise. So always check out a group’s organizer(s) before joining it since time is a very valuable resource and there may be better ways of using it for your data science endeavors. If the organizer is an actual data scientist or someone who strikes you as very knowledgeable on the subject, you can hop onboard. In addition, check out the events that the group hosts. If they include a lot of talks by respectable professionals who are related to the field, it is a worthwhile choice. If most of the meetings are just conversations among the members, maybe you can skip that one. Finally, make sure that there are several members in the group (the more the better) to ensure that you will meet lots of interesting professionals, improving your chances for learning. Note that a group about machine learning or data mining is also relevant, so don’t consider only groups with “data science” in their names.
You can learn from a data science group in (at least) two ways. First of all, you can attend the events where a knowledgeable speaker presents a data science topic. This person could be a researcher in the field or an industry professional, possibly even a developer of some promising new big data program. We already talked a bit about Storm and how great an alternative it is to Hadoop. This piece of software became popular through the developers’ various presentations. Imagine attending one such presentation and being one of the first people to learn about the software. If you played your card right and acted on the knowledge, you would have an edge when it came to this program. And if a company was looking for someone who was familiar with it, you’d be one of the people to be shortlisted.
The other way to learn with a data science group is through active conversation with the other members of the group. (This approach is useful for all kinds of professional events, by the way.) This means actively participating in a conversation, asking meaningful questions, providing brief and focused replies, etc. If you enter a conversation and let yourself vent about the problems you are facing at work or about topics that are of no interest to the others, don’t expect to learn much or keep the other participants of the conversation interested for long. Listening is the key here as well as being able to ask questions that will make the other person think and offer meaningful responses, engaging them in a creative debate.
Apart from all these sources of learning, there are also the various data science websites and blogs, which you are probably somewhat familiar with. Books may be great at providing you with some reliable fundamental knowledge about the field, but when it comes to staying updated, nothing beats the Web. It would be a futile task to try to list all the various online resources on data science, especially considering how quickly the Web is changing. However, there are a few that are definitely worth looking into (see Appendices 1 and 2). One that seems particularly easy to digest is the Data Science 101 blog – http://datascience101.wordpress.com.
9.5 Requirements Issues
Requirements issues are a type of problem you may encounter although this greatly depends on the company you are in (or your clients, if you are working as an independent consultant or freelancer). Many IT professionals encounter problems with requirements, and it is not unusual to see similar problems in a data science setting. Issues with requirements have to do with the miscommunication and misunderstanding of a project’s requirements as well as how they are implemented in a working prototype. That’s surely something that some good communication can fix, right?
Well, it’s a bit more complicated than that. When two parties of completely different backgrounds and priorities communicate, even if they communicate well, there may be subtle differences in how things settle after they are filtered by what’s feasible (taking into account the resource limitations described previously). For example, your manager may want you to create a prediction model for a particular dataset, but after examining it you realize that this may not be feasible with the data you have or the tools you can muster (let alone the timeframe in which this needs to be done). The requirements need to be reasonable, but reasonable is a relative term when it comes to something that has not been created yet. Creating a data product may not work as you imagine it to work, so you and your manager or client need to agree on a set of requirements that also outline the desired end result. You need to be able to manage your client’s expectations, help them understand the limitations of your tools and hardware and find a solution that is mutually satisfactory. That takes a lot of creativity, diplomacy and communication, not to mention patience.
9.6 Insufficient Know-How Issues
Lack of knowledge is the most commonly encountered issue for people who are new in a data science job though it can be encountered by more established data scientists as well, especially when changing industries. If you have insufficient know-how, it’s better to admit it and offer a strategy for overcoming the issue rather than hiding it and pretending you know everything. Try to augment your knowledge from one or more of the following sources:
Issues related to insufficient know-how may be challenging, but with enough humility, open-mindedness and research, it’s only a matter of time before you resolve them. No one was born knowing everything, and no one knows all there is to know in this ever-changing field. So don’t hesitate to seek the missing pieces of your know-how puzzle and evolve as a professional through this experience.
9.7 Tool Integration Issues
Tool integration issues are common in the work of a data scientist. You may have a great tool (e.g., Matlab), but the other software you work with doesn’t integrate with its scripts. Or you may develop a great data analysis script in R (which pretty much every data science software integrates with), but the format of the data file or data stream you have to process is not recognizable by your script.
In general, when dealing with an integration issue you need to research the technologies used and how other people have resolved (or at least tried to resolve) the problem at hand. Clever use of a search engine is a big asset here, so make use of it extensively. You will also need to employ your communication skills, your creativity and, of course, patience. Here’s another instance when the contacts you’ve made through networking can be useful. Solving tool integration issues will enable you to become more intimately acquainted with the software you use and allow you to exercise your creativity and research skills, both of which are essential to your role.
9.8 Key Points