A simple 2D example

Let's develop our intuition of how the autoencoder works with a simple two-dimensional example. 

We first generate 10,000 points coming from a normal distribution with mean 0 and variance 1:

library(MASS)
library(keras)
Sigma <- matrix(c(1,0,0,1),2,2)
n_points <- 10000
df <- mvrnorm(n=n_points, rep(0,2), Sigma)
df <- as.data.frame(df)

The distribution of the values should look as follows:

Distribution of the variable V1 we just generated; the variable V2 looks fairly similar.
Distribution of the variables V1 and V2 we generated. 

Let's spice things up a bit and add some outliers to the mixture. In many fraud applications, the fraud rate is about 1–5%, so we generate 1% of our samples as coming from a normal distribution, with mean 5 and standard deviation 1:

# Set the outliers
n_outliers <- as.integer(0.01*n_points)
idxs <- sample(n_points,size = n_outliers)
outliers <- mvrnorm(n=n_outliers, rep(5,2), Sigma)
df[idxs,] <- outliers

The new distribution of points looks like this now:

New distribution of points after adding the outliers.

We will use an autoencoder with a single neural network in the hidden layer. Why not add more? The problem is that if the hidden state has equal or higher dimensions than the input state, we risk that our model might learn the identity function, that is, that the model learns g(f(x))=x everywhere. This is clearly not a very useful outlier identification method. We need to capture the essential features of the data, so that those unusual features would be highlighted later on, hence allowing to detect outliers.

With keras, it is really easy to set up the model, we need an input layer of shape 2, for our two-dimensional example. This is passed to our one-dimensional encoder, using a ReLU activation function and then decoded back into a two-dimensional space:

input_layer <- layer_input(shape=c(2))
encoder <- layer_dense(units=1, activation='relu')(input_layer)
decoder <- layer_dense(units=2)(encoder)
autoencoder <- keras_model(inputs=input_layer, outputs = decoder)

Before using the model, we need to compile it. We need to specify a loss functional, a metric to optimize and an algorithm to perform the gradient descent updates. We will go for the Adam solver, optimizing the classical mean squared error (that works in this problem, but we might need to change it for our particular application) and choose accuracy as the metric to optimize:

autoencoder %>% compile(optimizer='adam',
loss='mean_squared_error',
metrics=c('accuracy'))

Once this is set up, we are ready for training:

# Coerce the dataframe to matrix to perform the training
df <- as.matrix(df)
history <- autoencoder %>% fit(
df,df,
epochs = 30, batch_size = 128,
validation_split = 0.2
)

Using the command, plot(history), we can see how the training went for this example:

Training of our autoencoder.

So we see that, while the accuracy remains fairly high, there is a mysterious drop during the training. We should not worry too much about it for now, we will come to this issue later. As for the loss, it keeps decreasing as we add more data, which is expected.

Finally, Let's look at the reconstruction. We first generate the predictions from our trained autoencoder:

preds <- autoencoder %>% predict(df)
colnames(preds) <- c("V1", "V2")
preds <- as.data.frame(preds)

This is the reconstruction of the points, as per our autoencoder. We will color red those points whose reconstruction is at a (Euclidean) distance larger than three from the original images, and leave the others blue. Why those points? Our autoencoder learned that our dataset had some intrinsic properties (it learned the distribution of the average point), so in those points where the reconstruction error is anomalously large, there might be something worth looking at:

# Coerce back the matrix to data frame to use ggplot later
df <- as.data.frame(df)
# Euclidean distance larger than 3 = sum of squares larger than 9
df$color <- ifelse((df$V1-preds$V1)**2+(df$V2-preds$V2)**2>9,"red","blue")

Finally, we can look at the results with ggplot:

library(ggplot2)
df %>% ggplot(aes(V1,V2),col=df$color)+geom_point(color = df$color, position="jitter")

The following screenshot shows how well we did identifying unusual points:

Output of the autoencoder. In blue, the reconstructed points by our autoencoder. In red, the original points in the dataset. We see that there is a cloud of red points that thanks to our autoencoder we can identify as unusual.

In the preceding screenshot, the blue points are the reconstructed images from the autoencoder. We see that it learned correctly that most of the points come from a normal distribution centered at (0,0), as expected. However there are still some points which are normal on the original dataset and where pointed out as unusual. No need to get discouraged from learning autoencoders that soon, the reason for this is that the autoencoder we used is rather simple. We will look at more sophisticated ways to tackle the outlier detection problem with autoencoders.  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset