In the following steps, you will learn how to create and plot a toy dataset:
- To test our perceptron classifier, we need to create some mock data. Let's keep things simple for now and generate 100 data samples (n_samples) belonging to one of two blobs (centers), again relying on scikit-learn's make_blobs function:
In [3]: from sklearn.datasets.samples_generator import make_blobs
... X, y = make_blobs(n_samples=100, centers=2,
... cluster_std=2.2, random_state=42)
- One thing to keep in mind is that our perceptron classifier expects target labels to be either +1 or -1, whereas make_blobs returns 0 and 1. An easy way to adjust the labels is with the following equation:
In [4]: y = 2 * y - 1
- In the following code, we first import matplotlib's pyplot module, which provides the functionality to visualize the data.
- Then, we use ggplot (data visualization package) style for plotting the data.
- Next, we use a magic command, %matplotlib inline, which allows you to plot within the Jupyter Notebook.
- Then, plt.scatter for plotting a scatter plot, feed in the data X, both for the x1 axis and x2 axis.
- The x1 axis will take all the rows of the first column of X as X[:, 0] and the x2 axis will take all the rows of the second column of X as X[:, 1]. The other two arguments in plt.scatter are s, which is the marker size (size of the points), and c is the color. Since y can have only two values, either +1 or -1, the scatter plot will have a maximum of two values.
- Finally, we label the x axis and y axis as x1 and x2.
Let's have a look at the data:
In [5]: import matplotlib.pyplot as plt
... plt.style.use('ggplot')
... %matplotlib inline
... plt.scatter(X[:, 0], X[:, 1], s=100, c=y);
... plt.xlabel('x1')
... plt.ylabel('x2')
This will produce the following graph:
The preceding screenshot shows an example dataset for the perceptron classifier. What do you think, will our perceptron classifier have an easy time finding a decision boundary that separates these two blobs?
Chances are it will. We mentioned earlier that a perceptron is a linear classifier. This means that as long as you can draw a straight line in the preceding plot to separate the two blobs, there exists a linear decision boundary that the perceptron should be able to find, that is, if we implemented everything correctly. Let's find out.