Generating a toy dataset

In the following steps, you will learn how to create and plot a toy dataset:

To test our perceptron classifier, we need to create some mock data. Let's keep things simple for now and generate 100 data samples (n_samples) belonging to one of two blobs (centers), again relying on scikit-learn's make_blobs function:

In [3]: from sklearn.datasets.samples_generator import make_blobs
...     X, y = make_blobs(n_samples=100, centers=2,
...                       cluster_std=2.2, random_state=42)

One thing to keep in mind is that our perceptron classifier expects target labels to be either +1 or -1, whereas make_blobs returns 0 and 1. An easy way to adjust the labels is with the following equation:

In [4]: y = 2 * y - 1

In the following code, we first import matplotlib's pyplot module, which provides the functionality to visualize the data.
Then, we use ggplot (data visualization package) style for plotting the data.

Next, we use a magic command, %matplotlib inline, which allows you to plot within the Jupyter Notebook.
Then, plt.scatter for plotting a scatter plot, feed in the data X, both for the x1 axis and x2 axis.
The x1 axis will take all the rows of the first column of X as X[:, 0] and the x2 axis will take all the rows of the second column of X as X[:, 1]. The other two arguments in plt.scatter are s, which is the marker size (size of the points), and c is the color. Since y can have only two values, either +1 or -1, the scatter plot will have a maximum of two values.
Finally, we label the x axis and y axis as x1 and x2.

Let's have a look at the data:

In [5]: import matplotlib.pyplot as plt
...     plt.style.use('ggplot')
...     %matplotlib inline
...     plt.scatter(X[:, 0], X[:, 1], s=100, c=y);
...     plt.xlabel('x1')
...     plt.ylabel('x2')

This will produce the following graph:

The preceding screenshot shows an example dataset for the perceptron classifier. What do you think, will our perceptron classifier have an easy time finding a decision boundary that separates these two blobs?

Chances are it will. We mentioned earlier that a perceptron is a linear classifier. This means that as long as you can draw a straight line in the preceding plot to separate the two blobs, there exists a linear decision boundary that the perceptron should be able to find, that is, if we implemented everything correctly. Let's find out.

Table of Contents for Generating a toy dataset

Create new playlist

Sign In

Sign Up

Table of Contents for
Generating a toy dataset