Chapter 6. Using Generative Models

Generative models generate new data. In a way, they are the exact opposite of the models that we've dealt with in prior chapters. While an image classifier takes in a high-dimensional input, the image, and outputs a low-dimensional output such as the content of the image, a generative model goes about things in exactly the opposite way around. It might, for example, draw images from the description of what's in them.

Generative models are still in the experimental phase of their development, and are currently used mostly in image applications. However, they are an important model as shown by the fact that there have already been several applications that have used generative models that have caused an uproar within the industry.

In 2017, so-called DeepFakes began to appear on the internet. Generative Adversarial Networks (GANs), which we will cover later in this chapter, were used to generate pornographic videos featuring famous celebrities. The year before, in 2016, researchers showcased a system in which they could generate videos of politicians saying anything the researcher wanted them to say, complete with realistic mouth movements and facial expressions. An example of this can be seen in a fake speech made by former US president Barack Obama that news site BuzzFeed produced in 2018: https://youtu.be/cQ54GDm1eL0.

This technology is not completely negative, there are positive applications as well, especially if the generative model's data is sparse. If this is the case, generative models can generate realistic data that other models can then train on. Generative models are able to "translate" images, a prime example being taking satellite images and turning them into street maps. Another example is that generative models can generate code from website screenshots. They can even be used to combat unfairness and discrimination in machine learning models, as we will see in Chapter 9, Fighting Bias.

In the field of finance, data is frequently sparse. Think back to the fraud case from Chapter 2, Applying Machine Learning to Structured Data, in which we were classifying fraudulent transactions from transaction metadata. We found that there was not much fraud taking place in the dataset that we used, so the model had a hard time detecting when fraud was taking place. Usually, when this occurs, engineers make assumptions and create synthetic data. Machine learning models, however, can do this themselves, and in the process, they might even discover some useful features that can help with fraud detection.

In algorithmic trading, data is frequently generated in simulators. Want to know how your algorithm would do in a global selloff? Luckily, there are not that many global selloffs, so engineers at quantitative analysis firms spend a lot of their time creating simulations of selloffs. These simulators are often biased by the engineer's experience and their feelings about what a selloff should look like. However, what if the models could learn what a selloff fundamentally looks like, and then create data describing an infinite number of selloffs?

In this chapter, we'll be focusing on two families of generative models: autoencoders and GANs. Firstly there is the family of autoencoders, which aim to compress data into a lower dimensional representation and then reconstruct the data faithfully. The second family is that of the GANs, which aim to train a generator so that a separate discriminator cannot tell fake images from true images.

Understanding autoencoders

Technically, autoencoders are not generative models since they cannot create completely new kinds of data. Yet, variational autoencoders, a minor tweak to vanilla autoencoders, can. So, it makes sense to first understand autoencoders by themselves, before adding the generative element.

Autoencoders by themselves have some interesting properties that can be exploited for applications such as detecting credit card fraud, which is useful in our focus on finance.

Given an input, x, an autoencoder learns how to output x. It aims to find a function, f, so that the following is true:

Understanding autoencoders

This might sound trivial at first, but the trick here is that autoencoders have a bottleneck. The middle hidden layer's size is smaller than the size of the input, x. Therefore, the model has to learn a compressed representation that captures all of the important elements of x in a smaller vector.

This can best be shown in the following diagram, where we can see a compressed representation of the Autoencoder scheme:

Understanding autoencoders

Autoencoder scheme

This compressed representation aims to capture the essence of the input, which turns out to be useful for us. We might, for example, want to capture what essentially distinguishes a fraudulent transaction from a genuine one. Vanilla autoencoders accomplish this with something similar to standard principal component analysis (PCA). They allow us to reduce the dimensionality of our data and focus on what matters. But in contrast to PCA, autoencoders can be extended in order to generate more data of a certain type. For example, autoencoders can better deal with image or video data since they can make use of the spatiality of data using convolutional layers.

In this section, we will build two autoencoders. The first will be used for handwritten digits from the MNIST dataset. Generative models are easier to debug and understand for visual data due to the fact that humans are intuitively good at judging whether two pictures show something similar, but are less good at judging abstract data. The second autoencoder is for a fraud detection task, using similar methods as the MNIST dataset.

Autoencoder for MNIST

Let's start with a simple autoencoder for the MNIST dataset of handwritten digits. An MNIST image is 28x28 pixels and can be flattened into a vector of 784 elements, which equals 28x28. We will compress this data into a vector with only 32 elements by using an autoencoder.

Before diving into the code described here, make sure you have saved the MNIST dataset on the right path, successfully imported both the NumPy and Matplotlib libraries, and set a random seed to ensure that your experiments are reproducible.

Note

Note: You can find the code for the MNIST autoencoder and variational autoencoder under the following URL https://www.kaggle.com/jannesklaas/mnist-autoencoder-vae.

We're going to set the encoding dimensionality hyperparameter now so that we can use it later:

encoding_dim = 32

Then, we construct the autoencoder using the Keras functional API. While a simple autoencoder could be constructed using the sequential API, this is a good refresher for us on how the functional API works.

First, we import the Model class, which allows us to create functional API models. We also need to import both the Input and Dense layers. You'll remember from previous chapters how the functional API needs a separate input layer, while the sequential API does not need one. To import both layers, we need to run the following:

from keras.models import Model
from keras.layers import Input, Dense

Now we are chaining up the autoencoder's layers: an Input layer followed by a Dense layer that encodes the image to a smaller representation.

This is followed by a Dense decoding layer that aims to reconstruct the original image:

input_img = Input(shape=(784,))

encoded = Dense(encoding_dim, activation='relu')(input_img)

decoded = Dense(784, activation='sigmoid')(encoded)

After we have created and chained up the layers, we are then able to create a model that maps from the input to the decoded image:

autoencoder = Model(input_img, decoded)

To get a better idea of what is going on, we can plot a visualization of the resulting autoencoder model with the following code:

from keras.utils import plot_model
plot_model(autoencoder, to_file='model.png', show_shapes=True) plt.figure(figsize=(10,10))
plt.imshow(plt.imread('model.png'))

You can see our autoencoder as follows:

Autoencoder for MNIST

Autoencoder model

Which we can compile with:

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

To train this autoencoder, we use the X values as both the input and output:

autoencoder.fit(X_train_flat, X_train_flat,epochs=50,batch_size=256,shuffle=True,validation_data=(X_test_flat, X_test_flat))

After we train this autoencoder, which will take between one and two minutes, we can visually inspect how well it is doing. To do this, we first extract a single image from the test set, before adding a batch dimension to the image in order to run it through the model, which is what we use np.expand_dims for:

original = np.expand_dims(X_test_flat[0],0)

Now we're going to run the original image through the autoencoder. You'll remember that the original MNIST image showed us a number seven, so we're hoping that the output of our autoencoder shows a seven as well:

seven = autoencoder.predict(original)

Next, we're going to reshape both the autoencoder output as well as the original image back into 28x28-pixel images:

seven = seven.reshape(1,28,28)
original = original.reshape(1,28,28)

We then plot the original and reconstructed image next to each other. matplotlib does not allow the image to have a batch dimension, therefore we need to pass an array without it. By indexing the images with [0,:,:], we'll only pass the first item in the batch with all pixels.

This first item now doesn't have a batch dimension anymore:

fig = plt.figure(figsize=(7, 10))
a=fig.add_subplot(1,2,1)
a.set_title('Original')
imgplot = plt.imshow(original[0,:,:])

b=fig.add_subplot(1,2,2)
b.set_title('Autoencoder')
imgplot = plt.imshow(seven[0,:,:])

After running that code, you'll see that our hopes have been achieved! Compared to the original image (left), our autoencoder image (right) is also showing a seven!:

Autoencoder for MNIST

Autoencoder result

As you can see in the preceding screenshot, the reconstructed seven is still a seven, so the autoencoder was able to capture the general idea of what a seven is. It's not perfect though, as you can see it's a bit blurry around the edges, especially in the top left. It seems that while the autoencoder is unsure about the length of the lines, it does have a good idea that there are two lines in a seven and it is aware of the general direction they follow.

An autoencoder such as this one performs nonlinear PCA. It learns which components matter the most for a seven to be a seven. The usefulness of being able to learn this representation goes beyond images. Within credit card fraud detection, such principal components would make for good features that another classifier would be able to work with.

In the next section, we will apply an autoencoder to the credit card fraud problem.

Autoencoder for credit cards

Throughout this section, we will once again be dealing with the problem of credit card fraud. This time, we will be using a slightly different dataset from that in Chapter 2, Applying Machine Learning to Structured Data.

This new dataset contains records of actual credit card transactions with anonymized features; however, it does not lend itself much to feature engineering. Therefore, we will have to rely on end-to-end learning methods in order to build a good fraud detector.

Note

Note: You can find the dataset at: https://www.kaggle.com/mlg-ulb/creditcardfraud and the notebook with an implementation of an autoencoder and variational autoencoder at: https://www.kaggle.com/jannesklaas/credit-vae.

As usual, we first load the data. The Time feature shows the absolute time of the transaction, which makes the data a bit hard to deal with here. Therefore, we will just drop it, which we can do by running:

df = pd.read_csv('../input/creditcard.csv')
df = df.drop('Time',axis=1)

We then separate the X data on the transaction from the classification of the transaction and extract the NumPy array that underlies the pandas DataFrame:

X = df.drop('Class',axis=1).values
y = df['Class'].values

Now we need to scale the features. Feature scaling makes it easier for our model to learn a good representation of the data. This time around, we're going employ a slightly different method of feature scaling than what we did before. We'll scale all features to be between zero and one, as opposed to having a mean of zero and a standard deviation of one. By doing this, we ensure that there are neither any very high nor very low values in the dataset.

We must be aware that this method is susceptible to outliers influencing the result. For each column, we first subtract the minimum value, so that the new minimum value becomes zero. Next, we divide by the maximum value so that the new maximum value becomes one.

By specifying axis=0, we perform the scaling column-wise:

X -= X.min(axis=0)
X /= X.max(axis=0)

Then, finally, we split our data:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train,y_test = train_test_split(X,y,test_size=0.1)

We then create the exact same autoencoder as we did before; however, this time, we do it with different dimensions. Our input now has 29 dimensions, which we compress down to 12 dimensions before aiming to restore the original 29-dimensional output.

While 12 dimensions is a somewhat arbitrary choice here, it allows for enough capacity to capture all the relevant information while still significantly compressing the data:

from keras.models import Model
from keras.layers import Input, Dense

We are going to use the sigmoid activation function for the decoded data. This is only possible because we've scaled the data to have values between zero and one. We are also using a tanh activation within the encoded layer. This is just a style choice that worked well in experiments and ensures that encoded values are all between minus one and one. With that being said, you may use different activation functions depending on your individual needs.

If you are working with images or deeper networks, a ReLU activation is usually a good choice. However, if you are working with a shallower network, as we are doing here, then a tanh activation often works well:

data_in = Input(shape=(29,))
encoded = Dense(12,activation='tanh')(data_in)
decoded = Dense(29,activation='sigmoid')(encoded)
autoencoder = Model(data_in,decoded)

In this example, we've used a mean squared error loss. This seems a bit of an unusual choice at first, using a sigmoid activation with a mean squared error loss, yet it makes sense. Most people think that sigmoid activations have to be used with a cross-entropy loss, but cross-entropy loss encourages values to either be zero or one, which works well for classification tasks where this is the case.

In our credit card example, most values will be around 0.5. Mean squared error, which we can see being implemented in the code below, is better at dealing with values where the target is not binary, but on a spectrum. Binary cross entropy forces values to be close to zero and one, which is not what we always want:

autoencoder.compile(optimizer='adam',loss='mean_squared_error')

After training, which will take around two minutes, the autoencoder converges to a low loss:

autoencoder.fit(X_train,X_train,epochs = 20,batch_size=128,validation_data=(X_test,X_test))

The reconstruction loss is low, but how do we know whether our autoencoder is working well? Once again, a visual inspection will come to the rescue. As we've explained before, humans are very good at judging things visually, but not very good at judging abstract numbers.

To run a visual inspection, first we must make some predictions, in which we'll run a subset of our test set through the autoencoder:

pred = autoencoder.predict(X_test[0:10])

We must can then plot individual samples. The following code produces an overlaid bar chart comparing the original transaction data with the reconstructed transaction data:

import matplotlib.pyplot as plt
import numpy as np

width = 0.8

prediction   = pred[9]
true_value    = X_test[9]

indices = np.arange(len(prediction))

fig = plt.figure(figsize=(10,7))

plt.bar(indices, prediction, width=width, color='b', label='Predicted Value')

plt.bar([i+0.25*width for i in indices], true_value, width=0.5*width, color='r', alpha=0.5, label='True Value')

plt.xticks(indices+width/2., ['V{}'.format(i) for i in range(len(prediction))] )

plt.legend()

plt.show()

This code will then give us the following chart:

Autoencoder for credit cards

Autoencoder reconstruction versus original data

As you can see, our model does a fine job at reconstructing the original values. The reconstructed values often match the true values, and if they don't, then they only deviate by a small margin. As you can see, visual inspection gives more insight than looking at abstract numbers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset