Image recognition with shallow nets

Image classifiers can be created without using deep-learning algorithms and methods. To demonstrate, let's use the Fashion MNIST dataset, which is an alternative to the MNIST handwriting dataset. The name MNIST stands for the Modified National Institute of Standards and Technology database, and as the name suggests, it is a modified version of the original dataset created by the National Institute of Standards and Technology. While MNIST is a series of hand-drawn numbers, Fashion MNIST uses small images of different types of clothing. The clothing in the dataset is labeled with one of ten categories. Fashion MNIST has nothing to do with the National Institute of Standards and Technology; however, the MNIST name carried over since it is well-known as a database to use for image recognition.

Since this dataset is not very large and each image is only 28 x 28 pixels, we can use a machine-learning algorithm, such as RandomForest, to train a classifier. We will train a very simple RandomForest model and achieve surprisingly good results; however, at the end of the chapter, we will discuss why these same results will not scale as the dataset gets larger and the individual images get larger. We will now code our image recognition model using traditional machine-learning methods:

We will start by loading the tidyverse suite of packages, as shown in the following code. In this case, we only need readr for reading in the data; however, we will use other packages later. We will also load randomForest for training our model and caret for evaluating our model performance:

library(tidyverse)
library(caret)
library(randomForest)

The code here will not return any values to the console; however, within the RStudio environment, we will see a checkmark next to these packages in the Packages window indicating that they are ready to be used. Your Packages pane should look like the following image, which shows that two of three packages have been loaded:

Next, we read in the train and test data for the Fashion MNIST dataset with the help of the following code:

fm <- readr::read_csv('fashionmnist/fashion-mnist_train.csv')
fm_test <- readr::read_csv('fashionmnist/fashion-mnist_test.csv')

This code will place two data objects in our environment called fm and fm_test. The Environment pane should look like the following screenshot:

We will use fm to train our model. The data from fm will be used to compute weights for splits along this tree-based model. We will then use our model, which contains information on how the independent variable values relate to the target variables, to predict target variables for the fm_test data using the independent variable values.

Next, we will train our model. We set a seed for reproducibility so that we get the same quasirandom numbers every time we run the model, and as such, we always get the same results. We convert the label to a factor. The label, in this case, is an integer between 0 and 9; however, we do not want the model to treat these values numerically. Instead, they should be treated as different categories. The remaining columns aside from the label are all pixel values. We use ~. to denote that we will use all the remaining columns (all the pixel values) as independent variables for our model. We will grow 10 trees because this is simply an example that image classification can be done this way. Lastly, we will choose 5 variables at random during every split in our tree. We will train our RandomForest model in this way using the following code:

set.seed(0)

rf_model <- randomForest::randomForest(as.factor(label)~.,
data = fm,
ntree=10,
mtry=5)

When we execute the code, the model will run, which can take several minutes. During this time, we will be unable to execute any code in the console. We can see that the model is now in our environment. The following screenshot shows some of the details contained in the model object:

We can use this model object to make predictions on new data.

We then use our model to make predictions on the test dataset and use the ConfusionMatrix function to evaluate performance. The following code will populate the vector of predicted values and then evaluate the accuracy of the predictions:

pred <- predict(rf_model, fm_test, type="response")

caret::confusionMatrix(as.factor(fm_test$label), pred)

# Accuracy : 0.8457

The preceding code will create one last data object, which is a vector that holds the predicted values for each case based on the model being trained on the independent variables for that dataset. We also printed some output to our console with performance metrics. The output that you receive will look like the following screenshot:

The metrics are based on comparing the actual target variables for the test dataset with the predicted values from modeling on the test data.

Surprisingly, this model produced decent results. We have achieved an accuracy of 84.6%. This shows that a simple approach can work for a dataset like this; however, as the data scales up, this type of model will have worse performance.

To understand why, we should first explain how images are stored as data for modeling. When we view a grayscale image, we see lighter and darker areas. In fact, every pixel holds an integer from 0 for white to 255 for black and anywhere in between. These numbers are converted into tones so that we can visualize the image; however, for our purposes, we use these raw pixel values. When modeling with RandomForest, each pixel value is compared in isolation with all the other images; however, this is rarely ideal. Usually, we want to look for larger patterns of pixels within each image.

Let's explore how to create a shallow neural network with just one layer. The hidden layer of the neural network will perform a calculation using all input values so that the entire image is considered. We are going to make this a simple binominal classification problem for illustration purposes and use a method to create our neural network that is similar to the method we used in the last chapter. If you completed that chapter, then this will likely look familiar. Completing the previous chapter is not a prerequisite as we will walk through all the steps here:

Before starting, we will load two more libraries for the following code: the neuralnet package for training our model and the Metrics package for evaluation functions. In particular, we will use the AUC metric later to evaluate our model. Both of these libraries can be loaded by running the following lines of code:

library(neuralnet)
library(Metrics)

This code will not cause anything to happen in the console; however, we will see checks by these packages in the Package pane indicating that these packages are ready to use. Your Packages pane will look like the following screenshot:

First, we will change the target column so that it is a simple binary response rather than include all ten categories. This is done so that we can keep this neural network very straightforward, as this is just to create a benchmark for comparing with our CNN later and to show how coding the two styles of neural networks differs. This filtering is accomplished by running the following lines of code:

fm <- fm %>% dplyr::filter(label < 2)

fm_test <- fm_test %>% dplyr::filter(label < 2)

After running this code, we will see that the size of our data objects has changed and reduced in size as a result of our filtering. You should see that your data objects have changed from having 60,000 and 10,000 observations respectively to 12,000 and 2,000, as shown in the following screenshot:

With the data in this format, we are now able to proceed with writing our code as a binary response task.

Now, using the following code, we will remove the target variable from the test set and isolate it in a separate vector for evaluation later:

test_label <- fm_test$label

fm_test <- fm_test %>% dplyr::select(-label)

After running this code you will notice two changes: there is one less variable or column in the fm_test object and there is a new data object called test_label, which is a vector containing the values that were in the label column of the fm_test object. Your Environment pane should look like the following screenshot:

We have made this change because we do not want the label in our test object. In this object, we need to treat the data as if we do not know the true classes so that we can try to predict the classes. We then use the labels from the vector later to evaluate how well we predicted the correct values.

Next, we will create the formula for our neural network. Using the neuralnet function from the neuralnet package, we need our formula to be formatted with the target variable on one side of a tilde (~) and all of our independent variables on the other side connected by plus (+) signs. In the following code, we collect all columns names into a vector n and then use paste to concatenate each term from this vector with a plus sign in between:

n <- names(fm)
formula <- as.formula(paste("label ~", paste(n[!n == "label"], collapse = " + ", sep = "")))

After running this code, we can see the changes in our Environment pane. We will see the vector n that contains all the column names and the formula object that has the dependent variable and independent variables placed together in the proper format. Your Environment pane should now look like the following screenshot:

We ran the preceding code in order to create this formula object as it is a requirement for training a neural network using the neuralnet package.

After this, we can write the code to train our model. We will set a seed for reproducibility as we always do with modeling. We will include one hidden layer with the number of units set to approximately one-third the number of predictor variables. We will set the linear.output argument to false to denote that this will be a classification model. We will also set the activation function to logistic because this is a classification problem. We train our model in the way we described earlier using the following code:

set.seed(0)

net <- neuralnet::neuralnet(formula,
                            data = fm,
                            hidden = 250,
                            linear.output = FALSE,
                            act.fct = "logistic"
)

After running the code, we now have a new object in our Environment pane that contains all the details gathered from training our model that can now be applied to make predictions on new data. Your Environment pane should contain a model object similar to the one shown in the following screenshot:

Now that we have run this code, we have a model that we can use to make predictions on our test data.

Lastly, we can make our predictions and evaluate our results with the help of the following code:

prediction_list <- neuralnet::compute(net, fm_test)
predictions <- as.vector(prediction_list$net.result)

Metrics::auc(test_label, predictions)

Running this code will print the accuracy metric to the console. Your console should contain output just like the following image:

Looking at this output, we see that we have a significant improvement already. Accuracy is now up to 97.487%. When the pixels were considered in concert, it did improve results. We should remember that this model only used two target variables, and the selection of these target variables could also be part of the reason for the significant increase. In any case, with larger images, it is not efficient to push all pixel values to an activation function. This is where convolutional neural networks come in to solve this problem. They are able to look at smaller groupings of pixel values to look for patterns. They also contain a means of reducing dimensionality.

Let's now explore what separates convolutional neural networks from traditional neural networks.

Table of Contents for Image recognition with shallow nets

Create new playlist

Sign In

Sign Up

Table of Contents for
Image recognition with shallow nets