There are many ways to segment machine learning and dive deeper. In Chapter 1, How to Sound Like a Data Scientist, I mentioned statistical and probabilistic models. These models utilize statistics and probability, which we've seen in the previous chapters, in order to find relationships between data and make predictions. In this chapter, we will implement both types of models. In the following chapter, we will see machine learning outside the rigid mathematical world of statistics/probability. You can segment machine learning models by different characteristics, including the following:
For the purpose of education, I will offer my own breakdown of machine learning models. Branching off from the top level of machine learning, there are the following three subsets:
Simply put, supervised learning finds associations between features of a dataset and a target variable. For example, supervised learning models might try to find the association between a person's health features (heart rate, obesity level, and so on) and that person's risk of having a heart attack (the target variable).
These associations allow supervised models to make predictions based on past examples. This is often the first thing that comes to people's minds when they hear the phrase, machine learning, but it in no way does it encompass the realm of machine learning. Supervised machine learning models are often called predictive analytics models, named for their ability to predict the future based on the past.
Supervised machine learning requires a certain type of data called labeled data. This means that we must teach our model by giving it historical examples that are labeled with the correct answer. Recall the facial recognition example. That is a supervised learning model because we are training our model with the previous pictures labeled as either face or not face, and then asking the model to predict whether or not a new picture has a face in it.
Specifically, supervised learning works using parts of the data to predict another part. First, we must separate data into two parts, as follows:
Supervised learning attempts to find a relationship between the predictors and the response in order to make a prediction. The idea is that, in the future, a data observation will present itself and we will only know the predictors. The model will then have to use the predictors to make an accurate prediction of the response value.
Suppose we wish to predict whether someone will have a heart attack within a year. To predict this, we are given that person's cholesterol level, blood pressure, height, their smoking habits, and perhaps more. From this data, we must ascertain the likelihood of a heart attack. Suppose, to make this prediction, we look at previous patients and their medical history. As these are previous patients, we know not only their predictors (cholesterol, blood pressure, and so on), but we also know if they actually had a heart attack (because it already happened!).
This is a supervised machine learning problem because we are doing the following:
The hope here is that a patient will walk in tomorrow and our model will be able to identify whether or not the patient is at risk for a heart attack based on her/his conditions (just like a doctor would!).
As the model sees more and more labeled data, it adjusts itself in order to match the correct labels given to us. We can use different metrics (explained later in this chapter) to pinpoint exactly how well our supervised machine learning model is doing and how it can better adjust itself.
One of the biggest drawbacks of supervised machine learning is that we need this labeled data, which can be very difficult to get a hold of. Suppose we wish to predict heart attacks; we might need thousands of patients along with all of their medical information and years' worth of follow-up records for each person, which could be a nightmare to obtain.
In short, supervised models use historical labeled data in order to make predictions about the future. Some possible applications for supervised learning include the following:
Note how each of the preceding examples uses the word prediction, which makes sense seeing how I emphasized supervised learning's ability to make predictions about the future. Predictions, however, are not where the story ends.
Here is a visualization of how supervised models use labeled data to fit themselves and prepare themselves to make predictions:
Note how the supervised model learns from a bunch of training data and then, when it is ready, it looks at unseen cases and outputs a prediction.
Supervised learning exploits the relationship between the predictors and the response to make predictions, but sometimes, it is enough just knowing that there even is a relationship. Suppose we are using a supervised machine learning model to predict whether or not a customer will purchase a given item. A possible dataset might look as follows:
Person ID |
Age |
Gender |
Employed? |
Bought the product? |
---|---|---|---|---|
1 |
63 |
F |
N |
Y |
2 |
24 |
M |
Y |
N |
Note that, in this case, our predictors are Age, Gender, and Employed?, while our response is Bought the product? This is because we want to see if, given someone's age, gender, and employment status, they will buy the product.
Assume that a model is trained on this data and can make accurate predictions about whether or not someone will buy something. That, in and of itself, is exciting, but there's something else that is arguably even more exciting. The fact that we could make accurate predictions implies that there is a relationship between these variables, which means that to know if someone will buy your product, you only need to know their age, gender, and employment status! This might contradict the previous market research, indicating that much more must be known about a potential customer to make such a prediction.
This speaks to supervised machine learning's ability to understand which predictors affect the response and how. For example, are women more likely to buy the product, which age groups are prone to decline the product, is there a combination of age and gender that is a better predictor than any one column on its own? As someone's age increases, do their chances of buying the product go up, down, or stay the same?
It is also possible that all of the columns are not necessary. A possible output of a machine learning might suggest that only certain columns are necessary to make the prediction and that the other columns are only noise (they do not correlate to the response and therefore confuse the model).
There are, in general, two types of supervised learning models: regression and classification. The difference between the two is quite simple and lies in the response variable.
Regression models attempt to predict a continuous response. This means that the response can take on a range of infinite values. Consider the following examples:
Classification attempts to predict a categorical response, which means that the response only has a finite amount of choices. Here are some examples:
Example – regression
The following graphs show a relationship between three categorical variables (age, year they were born, and education level) and a person's wage:
Note that, even though each predictor is categorical, this example is regressive because the y axis, our dependent variable, our response, is continuous.
Our earlier heart attack example is classification because the response was: will this person have a heart attack within a year? This has only two possible answers: Yes or No.
Sometimes, it can be tricky to decide whether or not you should use classification or regression. Consider that we are interested in the weather outside. We could ask the question, how hot is it outside? In this case, your answer is on a continuous scale, and some possible answers are 60.7 degrees or 98 degrees. However, as an exercise, go and ask 10 people what the temperature is outside. I guarantee you that someone (if not most people) will not answer in some exact degrees but will bucket their answer and say something like it's in the 60s.
We might wish to consider this problem as a classification problem, where the response variable is no longer in exact degrees but is in a bucket. There would only be a finite number of buckets in theory, making the model perhaps learn the differences between 60s and 70s a bit better.
The second type of machine learning does not deal with predictions but has a much more open objective. Unsupervised learning takes in a set of predictors and utilizes relationships between the predictors in order to accomplish tasks such as the following:
The first element on this list is called dimension reduction and the second is called clustering. Both of these are examples of unsupervised learning because they do not attempt to find a relationship between predictors and a specific response and therefore are not used to make predictions of any kind. Unsupervised models, instead, are utilized to find organizations and representations of the data that were previously unknown.
The following screenshot is a representation of a cluster analysis:
The model will recognize that each uniquely colored cluster of observations is similar to another but different from the other clusters.
A big advantage for unsupervised learning is that it does not require labeled data, which means that it is much easier to get data that complies with unsupervised learning models. Of course, a drawback to this is that we lose all predictive power because the response variable holds the information to make predictions and, without it, our model will be hopeless in making any sort of predictions.
A big drawback is that it is difficult to see how well we are doing. In a regression or classification problem, we can easily tell how well our models are predicting by comparing our models' answers to the actual answers. For example, if our supervised model predicts rain and it is sunny outside, the model was incorrect. If our supervised model predicts the price will go up by 1 dollar and it goes up by 99 cents, our model was very close! In supervised modeling, this concept is foreign because we have no answer to compare our models to. Unsupervised models are merely suggesting differences and similarities, which then require a human's interpretation:
In short, the main goal of unsupervised models is to find similarities and differences between data observations. We will discuss unsupervised models in depth in later chapters.
In reinforcement learning, algorithms get to choose an action in an environment and then are rewarded (positively or negatively) for choosing this action. The algorithm then adjusts itself and modifies its strategy in order to accomplish some goal, which is usually to get more rewards.
This type of machine learning is very popular in AI-assisted gameplay as agents (the AI) are allowed to explore a virtual world and collect rewards and learn the best navigation techniques. This model is also popular in robotics, especially in the field of self-automated machinery, including cars:
Self-driving cars read in sensor input, act accordingly, and are then rewarded for taking a certain action. The car then adjusts its behavior to collect more rewards. It can be thought that reinforcement is similar to supervised learning in that the agent is learning from its past actions to make better moves in the future; however, the main difference lies in the reward. The reward does not have to be tied in any way to a correct or incorrect decision. The reward simply encourages (or discourages) different actions.
Reinforcement learning is the least explored of the three types of machine learning and therefore is not explored in great length in this text. The remainder of this chapter will focus on supervised and unsupervised learning.
Of the three types of machine learning—supervised, unsupervised, and reinforcement learning—we can imagine the world of machine learning as something like this:
Each of the three types of machine learning has its benefits and its drawbacks, as listed:
For example, a car might crash into a wall and not know that that is not okay until the environment negatively rewards it