The autoencoder approach – Keras

OK, time to get into Keras. We should leave apart a small fraction of the data to use as a validation or test set, and develop the model on the remaining. There is no golden standard as to how this should be done. For this example, we will use a 10% test set and a 90% training set:

# Remove the time and class column
idxs <- sample(nrow(df), size=0.1*nrow(df))
train <- df[-idxs,]
test <- df[idxs,]
y_train <- train$Class
y_test <- test$Class
X_train <- train %>% select(-one_of(c("Time","Class")))
X_test <- test %>% select(-one_of(c("Time","Class")))
# Coerce the data frame to matrix to perform the training
X_train <- as.matrix(X_train)
X_test <- as.matrix(X_test)

Notice that we also excluded the Class and Time columns. We are ignoring the label and treating our fraud detection problem as an unsupervised learning problem, hence we need to remove the label column from the training data. As for the temporal information, as we saw before, there does not seem to be an obvious time trend. Furthermore, in real-life fraud detection scenarios, we are rather concerned about the intrinsic properties of the fraudster, for instance, device used, geolocation information or data from the CRM system, as well as account properties (balance, average transaction volume, and so on). 

For the architecture of the autoencoder, instead of using one intermediate layer as before, we will now use a stacked autoencoder. A stacked autoencoder is nothing more than several layers of encoders, followed by layers of decoders. In this case, we will use a network with outer encoder-decoder layers of 14 fully connected neurons, two inner layers of 7 neurons and yet another inner layer of 7 neurons. You can experiment with different architectures and compare results with ours, there is no universally correct architecture for autoencoders, it relies merely on experience and on diagnosing your model via validation plots and other metrics.

Our input (and output) dimension is 29 in each case. The code to construct the autoencoder is:

library(keras)
input_dim <- 29
outer_layer_dim <- 14
inner_layer_dim <- 7
input_layer <- layer_input(shape=c(input_dim))
encoder <- layer_dense(units=outer_layer_dim, activation='relu')(input_layer)
encoder <- layer_dense(units=inner_layer_dim, activation='relu')(encoder)
decoder <- layer_dense(units=inner_layer_dim)(encoder)
decoder <- layer_dense(units=outer_layer_dim)(decoder)
decoder <- layer_dense(units=input_dim)(decoder)
autoencoder <- keras_model(inputs=input_layer, outputs = decoder)

We can look at our work to check everything is correct:

autoencoder
Model
_________________________________________________________________________________
Layer (type) Output Shape Param #
============================================================================
input_5 (InputLayer) (None, 29) 0
_________________________________________________________________________________
dense_17 (Dense) (None, 14) 420
_________________________________________________________________________________
dense_18 (Dense) (None, 7) 105
_________________________________________________________________________________
dense_22 (Dense) (None, 7) 56
_________________________________________________________________________________
dense_23 (Dense) (None, 7) 56
_________________________________________________________________________________
dense_24 (Dense) (None, 14) 112
_________________________________________________________________________________
dense_25 (Dense) (None, 29) 435
==========================================================================
Total params: 1,184
Trainable params: 1,184
Non-trainable params: 0

We are now ready to begin our training. We should first compile the model and then fit it:

autoencoder %>% compile(optimizer='adam',
loss='mean_squared_error',
metrics=c('accuracy'))
history <- autoencoder %>% fit(
X_train,X_train,
epochs = 10, batch_size = 32,
validation_split=0.2
)
plot(history)

Our results are shown as follows. You can see that there is an increase in accuracy as the number of epochs increases:

Diagnostic plots for our 14-7-7-7-14 architecture.

Once we have the autoencoder ready, we use it to reconstruct the test set:

# Reconstruct on the test set
preds <- autoencoder %>% predict(X_test)
preds <- as.data.frame(preds)

We will look for anomalously large reconstruction errors, as before, to be labelled as unusual. For instance, we can look at those points whose reconstruction error is larger than 30 and declare them as unusual: 

y_preds <- ifelse(rowSums((preds-X_test)**2)/30<1,rowSums((preds-X_test)**2)/30,1)

Again, this threshold is not set in stone, and using your test set in your particular application you can fine-tune it and find the most suitable threshold for your problem.

Finally, let's generate the ROC curve to see if our model is performing correctly using:

library(ROCR)
pred <- prediction(y_preds, y_test)
perf <- performance(pred, measure = "tpr", x.measure = "fpr")
plot(perf, col=rainbow(10))

We see that the results are satisfactory. Our curve looks quite straight, and the reason for that is that the output of our model is just binary, as well as our original labels. When your model inputs class probabilities, or a proxy for it, then the curve would be smoother:

ROC curve: It looks quite straight since the outputs of the model are not class probabilities, but binary.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset