Autoencoders and MNIST

Many examples of deep learning algorithms in either research papers, blog posts or books deal with the MNIST dataset. We should not be the exception and introduce a small use case for autoencoders using MNIST.

The motivation is the following, suppose you want to detect fake banknotes automatically. Then you would need to teach the computer what the representation of the average banknote is to be able to detect those that have significant differences. Due to the large volume of cash transactions happening every day worldwide, and to the increasing sophistication of fraudsters, it would be unthinkable to do this process manually. One way to do this is to use sophisticated imaging software, which is how counterfeit banknote detectors, such as D40 or D50, work.

Another reason for using MNIST is obviously practical. At the time of writing I was unable to find a nice training dataset with counterfeit banknotes, and MNIST comes already preinstalled in keras.

We start with loading the dataset:

library(keras)
mnist <- dataset_mnist()
X_train <- mnist$train$x
y_train <- mnist$train$y
X_test <- mnist$test$x
y_test <- mnist$test$y

Let's take a closer look at the dataset:

image(X_train[1,,], col=gray.colors(3))
y_train[1]

If everything works correctly, you should see the image of the number five.

We need to do a bit of preprocessing before training our autoencoder. The X_train data is a three-dimensional array (images, width, height) of grayscale values. We need to first convert these arrays into matrices by reshaping the height and width into a single vector, so that instead of dealing with 28 x 28 squares we have a 28*28=784 vector. Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values, ranging between 0 and 1:

# reshape
dim(X_train) <- c(nrow(X_train), 784)
dim(X_test) <- c(nrow(X_test), 784)
# rescale
X_train <- X_train / 255
X_test <- X_test / 255

Once the initial preprocessing is done, we define the topology of our autoencoder. Let's use an encoding layer with 32 neurons, to achieve a compression ratio of 784/32 = 24.5:

input_dim <- 28*28 #784
inner_layer_dim <- 32
input_layer <- layer_input(shape=c(input_dim))
encoder <- layer_dense(units=inner_layer_dim, activation='relu')(input_layer)
decoder <- layer_dense(units=784)(encoder)
autoencoder <- keras_model(inputs=input_layer, outputs = decoder)

We are ready to compile and train the model:

autoencoder %>% compile(optimizer='adam', 
 loss='mean_squared_error', 
 metrics=c('accuracy'))
history <- autoencoder %>% fit(
 X_train,X_train, 
 epochs = 50, batch_size = 256, 
 validation_split=0.2
)

Using the plot command, we can see the performance of our model during the training and validation as follows:

plot(history)

Even better, if you are using RStudio as your IDE, there is a real-time view in the Viewer panel. With this command you should see two plots, showing the accuracy and loss as a function of the epoch number:

Use the Viewer panel to see the training in real time. Click on the Open in New Window button:

Real time training. On the horizontal axis, the epoch number. Loss and accuracy are shown in the vertical axes.

Using the predict method, we reconstruct the digits and compute the reconstruction errors:

# Reconstruct on the test set
preds <- autoencoder %>% predict(X_test)
error <- rowSums((preds-X_test)**2)
error

Intuitively, some classes might be harder to predict, because some people write numbers in slightly different ways. Which classes have a higher reconstruction error?

# Which were more problematic to reconstruct?
eval <- data.frame(error=error, class=as.factor(y_test))
library(dplyr)
eval %>% group_by(class) %>% summarise(avg_error=mean(error))

## OUTPUT
# A tibble: 10 x 2
 class avg_error
 <fctr> <dbl>
 1 0 14.091963
 2 1 6.126858
 3 2 17.245944
 4 3 14.138960
 5 4 13.189842
 6 5 15.170581
 7 6 14.570642
 8 7 11.778826
 9 8 16.120203
10 9 11.645038

Note that, some small variations are expected, since there is, for instance, a random component involved in the shuffling of the data. However, the general trends should be similar.

An image says more than a thousand words, so even better than summarizing our data with dplyr, we can use ggplot2 to visualize this information:

library(ggplot2)
eval %>% 
 group_by(class) %>% 
 summarise(avg_error=mean(error)) %>% 
 ggplot(aes(x=class,fill=class,y=avg_error))+geom_col()

We can see, as follows, how our reconstruction error performed per class. This is important as it will let's know if our classifier is biased in some way, or if it finds some classes harder to train than others:

Reconstruction error in the MNIST dataset. 2 and 8 seem to be the most problematic classes,
and 1 seems the most straight forward to recognize.

Well, that is certainly useful and interesting to see, we see that 2 is somewhat harder to reconstruct, which might be due to the fact that it looks like a 7 sometimes. Intuitively, 8 could be easily confused with 9 or with 0, so the results somehow make sense.

An even better way to look at how our reconstruction autoencoder is performing, is to look directly at the reconstructed examples. For this, we need first to reshape back our original data and the reconstruction:

# Reshape original and reconstructed
dim(X_test) <- c(nrow(X_test),28,28)
dim(preds) <- c(nrow(preds),28,28)

And now let's look at the reconstructed image:

image(255*preds[1,,], col=gray.colors(3))

Let's look at the reconstructed image of a typical element of the test set:

Reconstructed image by our autoencoder.

How does it compare to the original image, before reconstruction? Let's take a look:

y_test[1]
 image(255*X_test[1,,], col=gray.colors(3))

Original image.

Overall, not bad for a 24.5 compression! Clearly there is a lot to be improved, but we can already see the potential of autoencoders to learn intrinsic features of the data.

Table of Contents for Autoencoders and MNIST

Create new playlist

Sign In

Sign Up

Table of Contents for
Autoencoders and MNIST