Where to go from here

The goal of this book was to introduce you to the world of machine learning and prepare you to become a machine learning practitioner. Now that you know everything about the fundamental algorithms, you might want to investigate some topics in more depth.

Although it is not necessary to understand all of the details of all of the algorithms we implemented in this book, knowing some of the theory behind them might just make you a better data scientist.

If you are looking for more advanced material, then you might want to consider some of the following classics:

Stephen Marsland, Machine Learning: An Algorithmic Perspective, Second Edition, Chapman and Hall/Crc, ISBN 978-146658328-3, 2014
Christopher M. Bishop, Pattern Recognition and Machine Learning. Springer, ISBN 978-038731073-2, 2007
Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer, ISBN 978-038784857-0, 2016

When it comes to software libraries, we already learned about two essential ones—OpenCV and scikit-learn. Often, using Python is great for trying out and evaluating models, but larger web services and applications are more commonly written in Java or C++.

For example, the C++ package is Vowpal Wabbit (VW), which comes with its own command-line interface. For running machine learning algorithms on a cluster, people often use mllib, a Scala library built on top of Spark. If you are not married to Python, you might also consider using R, another common language of data scientists. R is a language designed specifically for statistical analysis and is famous for its visualization capabilities and the availability of many (often highly specialized) statistical modeling packages.

No matter which software you choose going forward, I guess the most important advice is to keep practicing your skills. But you already knew that. There are a number of excellent datasets out there that are just waiting for you to analyze them:

Throughout this book, we made great use of the example datasets that are built into scikit-learn. In addition, scikit-learn provides a way to load datasets from external services, such as mldata.org. Refer to http://scikit-learn.org/stable/datasets/index.html for more information.
Kaggle is a company that hosts a wide range of datasets as well as competitions on their website, http://www.kaggle.com. Competitions are often hosted by a variety of companies, nonprofit organizations, and universities, and the winner can take home some serious monetary prizes. A disadvantage of competitions is that they already provide a particular metric to optimize and usually a fixed, preprocessed dataset.
The OpenML platform (http://www.openml.org) hosts over 20,000 datasets with over 50,000 associated machine learning tasks.

Another popular choice is the UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/index.php), hosting over 370 popular and well-maintained datasets through a searchable interface.

Finally, if you are looking for more example code in Python, a number of excellent books nowadays come with their own GitHub repository:

Jake Vanderplas, Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly, ISBN 978-149191205-8, 2016, https://github.com/jakevdp/PythonDataScienceHandbook
Andreas Muller and Sarah Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists. O'Reilly, ISBN 978-144936941-5, 2016, https://github.com/amueller/introduction_to_ml_with_python
Sebastian Raschka, Python Machine Learning. Packt, ISBN 978-178355513-0, 2015, https://github.com/rasbt/python-machine-learning-book

Table of Contents for Where to go from here

Create new playlist

Sign In

Sign Up

Table of Contents for
Where to go from here