Image reconstruction using VAEs

Our first example will use the MNIST data to illustrate the use of Variational Autoencoders.

The development strategy is as follows:

First, an encoder network turns the input samples x, into two parameters in a latent space, which will be denoted z_mean and z_log_sigma
Then, we randomly sample similar points z from the latent normal distribution which we assumed is used to generate the data, as z ~ z_mean + exp(z_log_sigma)*epsilon where epsilon is a random normal tensor
Once this is done, a decoder network maps these latent space points z back to the original input data

We begin as usual, getting and preprocessing the data:

library(keras)
# Switch to the 1-based indexing from R
options(tensorflow.one_based_extract = FALSE)
K <- keras::backend()
mnist <- dataset_mnist()
X_train <- mnist$train$x
y_train <- mnist$train$y
X_test <- mnist$test$x
y_test <- mnist$test$y
# reshape
dim(X_train) <- c(nrow(X_train), 784)
dim(X_test) <- c(nrow(X_test), 784)
# rescale
X_train <- X_train / 255
X_test <- X_test / 255

Note the additional line:

K <- keras::backend()

This gets us a reference to the tensor backend where Keras will perform the tensor operations.

Now we turn to the VAE. It will consist of a latent dimension of size 2 and a hidden layer of 256 neurons:

orig_dim <- 784
latent_dim <- 2
inner_dim <- 256
X <- layer_input(shape = c(original_dim))
hidden_state <- layer_dense(X, intermediate_dim, activation = "relu")
z_mean <- layer_dense(hidden_state, latent_dim)
z_log_sigma <- layer_dense(hidden_state, latent_dim)

Next, with the help of our Keras backend, we define the sampling function that will generate the data from the latent variables:

sample_z<- function(params){
 z_mean <- params[,0:1]
 z_log_sigma <- params[,2:3]
 epsilon <- K$random_normal(
 shape = c(K$shape(z_mean)[[1]]), 
 mean=0.,
 stddev=1
 )
 z_mean + K$exp(z_log_sigma/2)*epsilon
}

We now define the sampled points:

z <- layer_concatenate(list(z_mean, z_log_sigma)) %>%
  layer_lambda(sample_z)

Time to define the decoder. We create separate instances of these layers to be able to reuse them later:

decoder_hidden_state <- layer_dense(units = intermediate_dim, activation = "relu")
decoder_mean <- layer_dense(units = original_dim, activation = "sigmoid")
hidden_state_decoded <- decoder_hidden_state(z)
X_decoded_mean <- decoder_mean(hidden_state_decoded)

We are ready! Our VAE is specified by the following encoder and decoder components:

# end-to-end autoencoder
variational_autoencoder <- keras_model(X, X_decoded_mean)

encoder <- keras_model(X, z_mean)
decoder_input <- layer_input(shape = latent_dim)

decoded_hidden_state_2 <- decoder_hidden_state(decoder_input)
decoded_X_mean_2 <- decoder_mean(decoded_hidden_state_2)
generator <- keras_model(decoder_input, decoded_X_mean_2)

It remains to specify the custom loss function, since we are adding the KL-divergence penalization:

loss_function <- function(X, decoded_X_mean){
  cross_entropy_loss <- loss_binary_crossentropy(X, decoded_X_mean)
  kl_loss <- -0.5*K$mean(1 + z_log_sigma - K$square(z_mean) - K$exp(z_log_sigma), axis = -1L)
  cross_entropy_loss + kl_loss
}

We compile and run our algorithm, as usual:

variational_autoencoder %>% compile(optimizer = "rmsprop", loss = loss_function)
history <- variational_autoencoder %>% fit(
  X_train, X_train, 
  shuffle = TRUE, 
  epochs = 10, 
  batch_size = 256, 
  validation_data = list(X_test, X_test)
)
plot(history)

After the training is done, we can see the performance (or, follow it in real time using the Viewer in RStudio):

Performance of our for reconstruction in the MNIST data.

We can examine the performance of our algorithm with the following snippet:

library(ggplot2)
preds <- variational_autoencoder %>% predict(X_test)
error <- rowSums((preds-X_test)**2)
eval <- data.frame(error=error, class=as.factor(y_test))
eval %>% 
 group_by(class) %>% 
 summarise(avg_error=mean(error)) %>% 
 ggplot(aes(x=class,fill=class,y=avg_error))+geom_col()

The results are shown here:

The reconstruction error looks rather discouraging, because we were clearly better off with autoencoders. How could we improve this? One way is to improve the dimension of the latent space. In our current setup, our latent space is only two dimensional. However, note that there is not much lost in terms of quality:

Reconstructed image by our VAE on the left, and the original image on the right (on screen).

Moreover, we now have a generative process! That means that we can create our digits ourselves. Let's loop over the latent space and use the probability distribution to generate our own digits:

# Reshape original and reconstructed
dim(X_test) <- c(nrow(X_test),28,28)
dim(preds) <- c(nrow(preds),28,28)
image(255*preds[1,,], col=gray.colors(3))
y_test[1]
image(255*X_test[1,,], col=gray.colors(3))

grid_x <- seq(-4, 4, length.out = 3)
grid_y <- seq(-4, 4, length.out = 3)

rows <- NULL
for(i in 1:length(grid_x)){
  column <- NULL
  for(j in 1:length(grid_y)){
    z_sample <- matrix(c(grid_x[i], grid_y[j]), ncol = 2)
    column <- rbind(column, predict(generator, z_sample) %>% matrix(ncol = 28) )
  }
  rows <- cbind(rows, column)
}
rows %>% as.raster() %>% plot()

Let's look at a few digits generated by our VAE:

Digits generated by the VAE.

Table of Contents for Image reconstruction using VAEs

Create new playlist

Sign In

Sign Up

Table of Contents for
Image reconstruction using VAEs