The amount of data collected for various purposes in society has increased enormously in the last few decades. Machine learning is a way of making sense of all this data by leveraging what we know about the data. In the generalized picture of machine learning, the computer first learns from a given dataset (training) and creates a generalized model to represent it. With this model, it is possible to predict various outcomes, results, and groupings (classes). In this chapter, we will cover the following topics:
Before getting started, I will give you a brief introduction to machine learning and the package that we will use: Scikit-learn.
There are three main categories of machine learning: supervised, unsupervised, and reinforced. Given a simple dataset with input x
and output y
, supervised learning is when both x
and y
have known labels. The algorithm maps x
to y
and after training, it can predict y
values with x
as input. Contrary to this, unsupervised learning is when only x
is labeled and the algorithm finds a label for y
itself. Reinforced learning is when the computer learns without the need to map the input to an outcome and instead responds to the input. This is how algorithms that play chess or other games work.
They try to predict how to react to input without a clearly quantifiable outcome, instead seeking reinforcement; one example being to play the game continuously until it ends without making a mistake (that is, win). One feature-rich and popular package for machine learning in Python is Scikit-learn.