Outlier detection in MNIST

Let's look again at the outlier detection problem in MNIST. As before, let's say that the digit 0 is our outlier this time, and we would like to be able to detect it.

We go as before, reading and preprocessing the data:

library(keras)
# Switch to the 1-based indexing from R
options(tensorflow.one_based_extract = FALSE)
K <- keras::backend()
mnist <- dataset_mnist()
X_train <- mnist$train$x
y_train <- mnist$train$y
X_test <- mnist$test$x
y_test <- mnist$test$y
## Exclude "0" from the training set. "0" will be the outlier
outlier_idxs <- which(y_train!=0, arr.ind = T)
X_train <- X_train[outlier_idxs,,]
y_test <- sapply(y_test, function(x){ ifelse(x==0,"outlier","normal")})
# reshape
dim(X_train) <- c(nrow(X_train), 784)
dim(X_test) <- c(nrow(X_test), 784)
# rescale
X_train <- X_train / 255
X_test <- X_test / 255

Then we define our encoder structure. Note that we will use a different structure of 128 dimensional latent space and 256 neurons in the intermediate layer:

original_dim <- 784
latent_dim <- 2
intermediate_dim <- 256
X <- layer_input(shape = c(original_dim))
hidden_state <- layer_dense(X, intermediate_dim, activation = "relu")
z_mean <- layer_dense(hidden_state, latent_dim)
z_log_sigma <- layer_dense(hidden_state, latent_dim)

And rewrite our sample_z function to make it easier to customize:

sample_z<- function(params){
  z_mean <- params[,0:1]
  z_log_sigma <- params[,2:3]
  epsilon <- K$random_normal(
    shape = c(K$shape(z_mean)[[1]]), 
    mean=0.,
    stddev=1
  )
  z_mean + K$exp(z_log_sigma/2)*epsilon
}

Then we go for the decoder part:

z <- layer_concatenate(list(z_mean, z_log_sigma)) %>% 
  layer_lambda(sample_z)
decoder_hidden_state <- layer_dense(units = intermediate_dim, activation = "relu")
decoder_mean <- layer_dense(units = original_dim, activation = "sigmoid")
hidden_state_decoded <- decoder_hidden_state(z)
X_decoded_mean <- decoder_mean(hidden_state_decoded)

And finally, the full autoencoder:

variational_autoencoder <- keras_model(X, decoded_X_mean)
encoder <- keras_model(X, z_mean)
decoder_input <- layer_input(shape = latent_dim)
decoded_hidden_state_2 <- decoder_hidden_state(decoder_input)
decoded_X_mean_2 <- decoder_mean(decoded_hidden_state_2)
generator <- keras_model(decoder_input, decoded_X_mean_2)

We define loss function with:

loss_function <- function(X, decoded_X_mean){
  cross_entropy_loss <- loss_binary_crossentropy(X, decoded_X_mean)
  kl_loss <- -0.5*K$mean(1 + z_log_sigma - K$square(z_mean) - K$exp(z_log_sigma), axis = -1L)
  cross_entropy_loss + kl_loss
}

We use the same function as before and train the model:

variational_autoencoder %>% compile(optimizer = "rmsprop", loss = loss_function)
history <- variational_autoencoder %>% fit(
  X_train, X_train, 
  shuffle = TRUE, 
  epochs = 10, 
  batch_size = 256, 
  validation_data = list(X_test, X_test)
)
plot(history)

Once the training is done, we look at the performance:

preds <- variational_autoencoder %>% predict(X_test)
error <- rowSums((preds-X_test)**2)
eval <- data.frame(error=error, class=as.factor(y_test))
library(dplyr)
library(ggplot2)
eval %>% 
 ggplot(aes(x=class,fill=class,y=error))+geom_boxplot()

Let's look at the reconstruction error per class:

Reconstruction error using the VAE.

The plot suggests you set a reconstruction error for the threshold as 5:

threshold <- 5
y_preds <- sapply(error, function(x){ifelse(x>threshold,"outlier","normal")})

And we now look at the confusion matrix:

table(y_preds,y_test)
         y_test
y_preds normal outlier
  outlier 9020     980

Which suggests that we are on the right track! But before celebrating, we should look at the other classification metrics, the ROC curve and the area under this curve (AUC):

library(ROCR)
 pred <- prediction(error, y_test)
 perf <- performance(pred, measure = "tpr", x.measure = "fpr")
 auc <- unlist(performance(pred, measure = "auc")@y.values)
 auc
 plot(perf, col=rainbow(10))

We get an AUC of 0.8473375 and a reasonable ROC plot shown as follows, which tells us that our VAE did a good job distinguishing the outlier 0.

Note that this was much better than when the digit 7 was the outlier. This tells us that we need to put in extra effort when an abnormal observation resembles the usual observations too much:

ROC curve for our variational autoencoder.

Table of Contents for Outlier detection in MNIST

Create new playlist

Sign In

Sign Up

Table of Contents for
Outlier detection in MNIST