Chapter 3. Your first GAN: Generating handwritten digits

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Your first GAN: Generating handwritten digits

This chapter covers

Exploring the theory behind GANs and adversarial training
Understanding how GANs differ from conventional neural networks
Implementing a GAN in Keras, and training it to generate handwritten digits

In this chapter, we explore the foundational theory behind GANs. We introduce the commonly used mathematical notation you may encounter if you choose to dive deeper into this field, perhaps by reading a more theoretically focused publication or even one of the many academic papers on this topic. This chapter also provides background knowledge for the more advanced chapters, particularly chapter 5.

From a strictly practical standpoint, however, you don’t have to worry about many of these formalisms—much as you don’t need to know how an internal combustion engine works to drive a car. Machine learning libraries such as Keras and TensorFlow abstract the underlying mathematics away from us and neatly package them into importable lines of code.

This will be a recurring theme throughout this book; it is also true for machine learning and deep learning in general. So, if you are someone who prefers to dive straight into practice, feel free to skim through the theory section and skip ahead to the coding tutorial.

3.1. Foundations of GANs: Adversarial training

Formally, the Generator and the Discriminator are represented by differentiable functions, such as neural networks, each with its own cost function. The two networks are trained by backpropagation by using the Discriminator’s loss. The Discriminator strives to minimize the loss for both the real and the fake examples, while the Generator tries to maximize the Discriminator’s loss for the fake examples it produces.

This dynamic is summarized in figure 3.1. It is a more general version of the diagram from chapter 1, where we first explained what GANs are and how they work. Instead of the concrete example of handwritten digits, in this diagram, we have a general training dataset which, in theory, could be anything.

Figure 3.1. In this GAN architecture diagram, both the Generator and the Discriminator are trained using the Discriminator’s loss. The Discriminator strives to minimize the loss; the Generator seeks to maximize the loss for the fake examples it produces.

Importantly, the training dataset determines the kind of examples the Generator will learn to emulate. If, for instance, our goal is to produce realistic-looking images of cats, we would supply our GAN with a dataset of cat images.

In more technical terms, the Generator’s goal is to produce examples that capture the data distribution of the training dataset.^[1] Recall that to a computer, an image is just a matrix of values: two-dimensional for grayscale and three-dimensional for color (RGB) images. When rendered onscreen, the pixel values within these matrices manifest all the visual elements of an image—lines, edges, contours, and so forth. These values follow a complex distribution across each image in a dataset; after all, if no distribution is followed, an image will be no more than random noise. Object recognition models learn the patterns in images to discern an image’s content. The Generator can be thought of as the reverse of the process: rather than recognizing these patterns, it learns to synthesize them.

¹

See “Generative Adversarial Networks,” by Ian J. Goodfellow et al., 2014, https://arxiv.org/abs/1406.2661.

3.1.1. Cost functions

Following the standard notation, let J^(G) denote the Generator’s cost function and J^(D) the Discriminator’s cost function. The trainable parameters (weights and biases) of the two networks are represented by the Greek letter theta: θ^(G) for the Generator and θ^(D) for the Discriminator.

GANs differ from conventional neural networks in two key respects. First, the cost function, J, of a traditional neural network is defined exclusively in terms of its own trainable parameters, θ. Mathematically, this is expressed as J(θ). In contrast, GANs consist of two networks whose cost functions are dependent on both of the networks’ parameters. That is, the Generator’s cost function is J^(G)(θ^(G), θ^(D)), and the Discriminator’s cost function is J^(D)(θ^(G), θ^(D)).^[2]

²

See “NIPS 2016 Tutorial: Generative Adversarial Networks,” by Ian Goodfellow, 2016, https://arxiv.org/abs/1701.00160.

The second (related) difference is that a traditional neural network can tune all its parameters, θ, during the training process. In a GAN, each network can tune only its own weights and biases. The Generator can tune only θ^(G), and the Discriminator can tune only θ^(D) during training. Accordingly, each network has control over only a part of what determines its loss.

To make this a little less abstract, consider the following analogy. Imagine we are choosing which route to drive home from work. If there is no traffic, the fastest option is the highway. During rush hour, however, we may be better off taking one of the side roads. Despite being longer and windier, they might get us home faster when the highway is all clogged up with traffic.

Let’s phrase it as a math problem. Let J be our cost function, defined as the amount of time it takes us to get home. Our goal is to minimize J. For simplicity, let’s assume we have a set time to leave the office, so we cannot leave early to get ahead of rush hour or stay late to avoid it. The only parameter, θ, we can change is our route.

If ours were the only car on the road, our cost would be similar to a regular neural network’s: it would depend only on the route, and it would be entirely within our power to optimize, J(θ). However, as soon as we introduce other drivers into the equation, the situation gets more complicated. Suddenly, the time it will take us to get home depends not only on our decisions but also on other drivers’ course of action, J(θ^(us),θ^{(other drivers)}). Much like the Generator and Discriminator networks, our “cost function” will depend on an interplay of factors, some of which are under our control and others of which are not.

3.1.2. Training process

The two differences we’ve described have far-reaching implications on the GAN training process. The training of a traditional neural network is an optimization problem. We seek to minimize the cost function by finding a set of parameters such that moving to any neighboring point in the parameter space would increase the cost. This could be either a local or a global minimum in the parameter space, as determined by the cost function we are seeking to minimize. Figure 3.2 illustrates the optimization process of minimizing a cost function.

Figure 3.2. The bowl-shaped mesh represents the loss J in the parameter space θ₁ and θ₂. The black dotted line illustrates the minimization of the loss in the parameter space through optimization.

(Source: “Adversarial Machine Learning” by Ian Goodfellow, ICLR Keynote, 2019, www.iangoodfellow.com/slides/2019-05-07.pdf.)

Because the Generator and Discriminator can tune only their own parameters and not each other’s, GAN training can be better described as a game, rather than optimization.^[3] The players in this game are the two networks that the GAN comprises.

³

Ibid.

Recall from chapter 1 that GAN training ends when the two networks reach Nash equilibrium, a point in a game at which neither player can improve their situation by changing their strategy. Mathematically, this occurs when the Generator cost J^(G)(θ^(G), θ^(D)) is minimized with respect to the Generator’s trainable parameters θ^(G) and, simultaneously, the Discriminator cost J^(D)(θ^(G), θ^(D)) is minimized with respect to the parameters under this network’s control, θ^(D).^[4] Figure 3.3 illustrates the setup of a two-player zero-sum game and the process of reaching Nash equilibrium.

⁴

Ibid.

Figure 3.3. Player 1 (left) seeks to minimize V by tuning θ₁. Player 2 (middle) seeks to minimize –V (maximize V) by tuning θ₂. The saddle-shaped mesh (right) shows the combined loss in the parameter space V(θ₁, θ₂). The dotted line shows the convergence to Nash equilibrium at the center of the saddle. (Source: Goodfellow, 2019, www.iangoodfellow.com/slides/2019-05-07.pdf.)

Coming back to our analogy, Nash equilibrium would occur when every route home takes exactly the same amount of time—for us and all other drivers we may encounter on the way. Any faster route would be offset by a proportional increase in traffic, slowing everyone down just the right amount. As you may imagine, this state is virtually unattainable in real life. Even with tools like Google Maps that provide real-time traffic updates, it is often impossible to perfectly evaluate the optimal path home.

The same is true in the high-dimensional, nonconvex world of training GANs. Even small 28 × 28-pixel grayscale images like the ones in the MNIST dataset have 28 × 28 = 784 dimensions. If they were colored (RGB), their dimensionality would increase threefold, to 2,352. Capturing this distribution across all images in the training dataset is extremely difficult, especially when the best approach to learn is from an adversary (the Discriminator).

Training GANs successfully requires trial and error, and although there are best practices, it remains as much an art as it is a science. Chapter 5 revisits the topic of GAN convergence in more detail. For now, you can rest assured that the situation is not as bad as it may sound. As we previewed in chapter 1, and as you will see throughout this book, neither the enormous complexities in approximating the generative distribution nor our lack of complete understanding of what conditions make GANs converge has impeded GANs’ practical usability and their ability to generate realistic data samples.

3.2. The Generator and the Discriminator

Let’s recap what you’ve learned by introducing more notation. The Generator (G) takes in a random noise vector z and produces a fake example x*. Mathematically, G(z) = x*. The Discriminator (D) is presented either with a real example x or with a fake example x*; for each input, it outputs a value between 0 and 1 indicating the probability that the input is real. Figure 3.4 depicts the GAN architecture by using the terminology and notation we just presented.

Figure 3.4. The Generator network G transforms the random vector z into a fake example x: G(z) = x. The Discriminator network D outputs a classification of whether the input example is real. For the real examples x, the Discriminator strives to output values as close to 1 as possible. For the fake examples x, the Discriminator strives to output values as close to 0 as possible. In contrast, the Generator wants D(x) to be as close as possible to 1, indicating that the Discriminator was fooled into classifying a fake example as real.

3.2.1. Conflicting objectives

The Discriminator’s goal is to be as accurate as possible. For the real examples x, D(x) seeks to be as close as possible to 1 (label for the positive class). For fake examples x*, D(x*) strives to be as close as possible to 0 (label for the negative class).

The Generator’s goal is the opposite. It seeks to fool the Discriminator by producing fake examples x* that are indistinguishable from the real data in the training dataset. Mathematically, the Generator strives to produce fake examples x* such that D(x*) is as close to 1 as possible.

3.2.2. Confusion matrix

The Discriminator’s classifications can be expressed in terms of a confusion matrix, a tabular representation of all the possible outcomes in binary classification. In the case of the Discriminator, these are as follows:

True positive—Real example correctly classified as real; D(x) ≈ 1
False negative—Real example incorrectly classified as fake; D(x) ≈ 0
True negative—Fake example correctly classified as fake; D(x*) ≈ 0
False positive—Fake example incorrectly classified as real; D(x*) ≈ 1

Table 3.1 presents these outcomes.

Table 3.1. Confusion matrix of Discriminator outcomes

Input	Discriminator output
Input	Close to 1 (real)	Close to 0 (fake)
Real (x)	True positive	False negative
*Fake (x)**	False positive	True negative

Using the confusion matrix terminology, the Discriminator is trying to maximize true positive and true negative classifications or, equivalently, minimize false positive and false negative classifications. In contrast, the Generator’s goal is to maximize the Discriminator’s false positive classifications—these are the instances in which the Generator successfully fools the Discriminator into believing a fake example is real. The Generator is not concerned with how well the Discriminator classifies the real examples; it cares only about the Discriminator’s classifications of the fake data samples.

3.3. GAN training algorithm

Let’s revisit the GAN training algorithm from chapter 1 and formalize it by using the notation introduced in this chapter. Unlike the algorithm in chapter 1, this one uses mini-batches rather than one example at a time.

GAN training algorithm

For each training iteration do

Train the Discriminator:
1. Take a random mini-batch of real examples: x.
2. Take a mini-batch of random noise vectors z and generate a mini-batch of fake examples: G(z) = x*.
3. Compute the classification losses for D(x) and D(x*), and backpropagate the total error to update θ^(D) to minimize the classification loss.
Train the Generator:
1. Take a mini-batch of random noise vectors z and generate a mini-batch of fake examples: G(z) = x*.
2. Compute the classification loss for D(x*), and backpropagate the loss to update θ^(G) to maximize the classification loss.

End for

Notice that in step 1, the Generator’s parameters are kept intact while we train the Discriminator. Similarly, in step 2, we keep the Discriminator’s parameters fixed while the Generator is trained. The reason we allow updates only to the weights and biases of the network being trained is to isolate all changes to only the parameters that are under the network’s control. This ensures that each network gets relevant signals about the updates to make, without interference from the other’s updates. You can almost think of it as two players taking turns.

Of course, you can imagine a scenario in which each player merely undoes the other’s progress, so not even a turn-based game is guaranteed to yield a useful outcome. (Have we said yet that GANs are notoriously tricky to train?) More on this in chapter 5, where we also discuss techniques to maximize our chances of success.

That’s it for theory, for the time being. Let’s now put what we learned into practice and implement our first GAN.

3.4. Tutorial: Generating handwritten digits

In this tutorial, we will implement a GAN that learns to produce realistic-looking handwritten digits. We will use the Python neural network library Keras with a TensorFlow backend. Figure 3.5 shows a high-level architecture of the GAN we will implement.

Figure 3.5. Over the course of the training iterations, the Generator learns to turn random noise input into images that look like members of the training data: the MNIST dataset of handwritten digits. Simultaneously, the Discriminator learns to distinguish the fake images produced by the Generator from the genuine ones coming from the training dataset.

Much of the code used in this tutorial—especially the boilerplate code used in the training loop—was adapted from the open source GitHub repository of GAN implementations in Keras, Keras-GAN, created by Erik Linder-Norén (https://github.com/eriklindernoren/Keras-GAN). The repository also includes several advanced GAN variants, some of which will be covered later in this book. We revised and simplified the implementation considerably, in terms of both code and network architecture, and we renamed variables so that they are consistent with the notation used in this book.

A Jupyter notebook with the full implementation, including added visualizations of the training progress, is available on the book’s website at www.manning.com/books/gans-in-action and in the GitHub repository for this book at https://github.com/GANs-in-Action/gans-in-action under the chapter-3 folder. The code was tested with Python 3.6.0, Keras 2.1.6, and TensorFlow 1.8.0.

3.4.1. Importing modules and specifying model input dimensions

First, we import all the packages and libraries needed to run the model. Notice we also import the MNIST dataset of handwritten digits directly from keras.datasets.

Listing 3.1. Import statements

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import Dense, Flatten, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential
from keras.optimizers import Adam

Second, we specify the input dimensions of our model and dataset. Each image in MNIST is 28 × 28 pixels with a single channel (because the images are grayscale). The variable z_dim sets the size of the noise vector, z.

Listing 3.2. Model input dimensions

img_rows = 28
img_cols = 28
channels = 1

img_shape = (img_rows, img_cols, channels)    1

z_dim = 100                                   2

1 Input image dimensions
2 Size of the noise vector, used as input to the Generator

Next, we implement the Generator and the Discriminator networks.

3.4.2. Implementing the Generator

For simplicity, the Generator is a neural network with only a single hidden layer. It takes in z as input and produces a 28 × 28 × 1 image. In the hidden layer, we use the Leaky ReLU activation function. Unlike a regular ReLU function, which maps any negative input to 0, Leaky ReLU allows a small positive gradient. This prevents gradients from dying out during training, which tends to yield better training outcomes.

At the output layer, we employ the tanh activation function, which scales the output values to the range [–1, 1]. The reason for using tanh (as opposed to, say, sigmoid, which would output values in the more typical 0 to 1 range) is that tanh tends to produce crisper images.

The following listing implements the Generator.

Listing 3.3. Generator

def build_generator(img_shape, z_dim):
    model = Sequential()

    model.add(Dense(128, input_dim=z_dim))              1

    model.add(LeakyReLU(alpha=0.01))                    2

    model.add(Dense(28 * 28 * 1, activation='tanh'))    3

    model.add(Reshape(img_shape))                       4

    return model

1 Fully connected layer
2 Leaky ReLU activation
3 Output layer with tanh activation
4 Reshapes the Generator output to image dimensions

3.4.3. Implementing the Discriminator

The Discriminator takes in a 28 × 28 × 1 image and outputs a probability indicating whether the input is deemed real rather than fake. The Discriminator is represented by a two-layer neural network, with 128 hidden units and a Leaky ReLU activation function at the hidden layer.

For simplicity, our Discriminator network looks almost identical to the Generator. This does not have to be the case; indeed, in most GAN implementations, the Generator and Discriminator network architectures vary greatly in both size and complexity.

Notice that unlike for the Generator, in the following listing we apply the sigmoid activation function at the Discriminator’s output layer. This ensures that our output value will be between 0 and 1, so it can be interpreted as the probability the Generator assigns that the input is real.

Listing 3.4. Discriminator

def build_discriminator(img_shape):

    model = Sequential()

    model.add(Flatten(input_shape=img_shape))      1

    model.add(Dense(128))                          2

    model.add(LeakyReLU(alpha=0.01))               3

    model.add(Dense(1, activation='sigmoid'))      4

    return model

1 Flattens the input image
2 Fully connected layer
3 Leaky ReLU activation
4 Output layer with sigmoid activation

3.4.4. Building the model

In listing 3.5, we build and compile the Generator and Discriminator models implemented previously. Notice that in the combined model used to train the Generator, we keep the Discriminator parameters fixed by setting discriminator.trainable to False. Also note that the combined model, in which the Discriminator is set to untrainable, is used to train the Generator only. The Discriminator is trained as an independently compiled model. (This will become apparent when we review the training loop.)

We use binary cross-entropy as the loss function we are seeking to minimize during training. Binary cross-entropy is a measure of the difference between computed probabilities and actual probabilities for predictions with only two possible classes. The greater the cross-entropy loss, the further away our predictions are from the true labels.

To optimize each network, we use the Adam optimization algorithm. This algorithm, whose name is derived from adaptive moment estimation, is an advanced gradient-descent-based optimizer. The inner workings of this algorithm are beyond the scope of this book, but it suffices to say that Adam has become the go-to optimizer for most GAN implementations thanks to its often superior performance.

Listing 3.5. Building and compiling the GAN

def build_gan(generator, discriminator):

    model = Sequential()

    model.add(generator)                                    1
    model.add(discriminator)

    return model


discriminator = build_discriminator(img_shape)              2
discriminator.compile(loss='binary_crossentropy',
                      optimizer=Adam(),
                      metrics=['accuracy'])

generator = build_generator(img_shape, z_dim)               3

discriminator.trainable = False                             4

gan = build_gan(generator, discriminator)                   5
gan.compile(loss='binary_crossentropy', optimizer=Adam())

1 Combined Generator + Discriminator model
2 Builds and compiles the Discriminator
3 Builds the Generator
4 Keeps Discriminator’s parameters constant for Generator training
5 Builds and compiles GAN model with fixed Discriminator to train the Generator

3.4.5. Training

The training code in listing 3.6 implements the GAN training algorithm. We get a random mini-batch of MNIST images as real examples and generate a mini-batch of fake images from random noise vectors z. We then use those to train the Discriminator network while keeping the Generator’s parameters constant. Next, we generate a mini-batch of fake images and use those to train the Generator network while keeping the Discriminator’s parameters fixed. We repeat this for each iteration.

We use one-hot-encoded labels: 1 for real images and 0 for fake ones. To generate z, we sample from the standard normal distribution (a bell curve with 0 mean and a standard deviation of 1). The Discriminator is trained to assign fake labels to the fake images and real labels to real images. The Generator is trained such that the Discriminator assigns real labels to the fake examples it produces.

Notice that we are rescaling the real images in the training dataset from –1 to 1. As you saw in the preceding example, the Generator uses the tanh activation function at the output layer, so the fake images will be in the range (–1, 1). Accordingly, we have to rescale all the Discriminator’s inputs to the same range.

Listing 3.6. GAN training loop

losses = []
accuracies = []
iteration_checkpoints = []

def train(iterations, batch_size, sample_interval):

    (X_train, _), (_, _) = mnist.load_data()                        1

    X_train = X_train / 127.5 - 1.0                                 2
    X_train = np.expand_dims(X_train, axis=3)

    real = np.ones((batch_size, 1))                                 3

    fake = np.zeros((batch_size, 1))                                4

    for iteration in range(iterations):



        idx = np.random.randint(0, X_train.shape[0], batch_size)    5
        imgs = X_train[idx]

        z = np.random.normal(0, 1, (batch_size, 100))               6
        gen_imgs = generator.predict(z)

        d_loss_real = discriminator.train_on_batch(imgs, real)      7
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)



        z = np.random.normal(0, 1, (batch_size, 100))               8
        gen_imgs = generator.predict(z)

        g_loss = gan.train_on_batch(z, real)                        9

        if (iteration + 1) % sample_interval == 0:

            losses.append((d_loss, g_loss))                         10
            accuracies.append(100.0 * accuracy)
            iteration_checkpoints.append(iteration + 1)

            print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %    11
                  (iteration + 1, d_loss, 100.0 * accuracy, g_loss))

            sample_images(generator)                                12

1 Loads the MNIST dataset
2 Rescales [0, 255] grayscale pixel values to [–1, 1]
3 Labels for real images: all 1s
4 Labels for fake images: all 0s
5 Gets a random batch of real images
6 Generates a batch of fake images
7 Trains the Discriminator
8 Generates a batch of fake images
9 Trains the Generator
10 Saves losses and accuracies so they can be plotted after training
11 Outputs training progress
12 Outputs a sample of generated images

3.4.6. Outputting sample images

In the Generator training code, you may notice an invocation of the sample_images()function. This function gets called every sample_interval iterations and outputs a 4 × 4 grid of images synthesized by the Generator in the given iteration. After we run our model, we will use these images to inspect interim and final outputs.

Listing 3.7. Displaying generated images

def sample_images(generator, image_grid_rows=4, image_grid_columns=4):

    z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim)) 1

    gen_imgs = generator.predict(z)                                           2

    gen_imgs = 0.5 * gen_imgs + 0.5                                           3

    fig, axs = plt.subplots(image_grid_rows,                                  4
                            image_grid_columns,
                            figsize=(4, 4),
                            sharey=True,
                            sharex=True)

    cnt = 0
    for i in range(image_grid_rows):
        for j in range(image_grid_columns):
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')             5
            axs[i, j].axis('off')
            cnt += 1

1 Sample random noise
2 Generates images from random noise
2 Rescales image pixel values to [0, 1]
4 Sets image grid
5 Outputs a grid of images

3.4.7. Running the model

That brings us to the final step, shown in listing 3.8. We set the training hyperparameters—the number of iterations and the batch size—and train the model. There is no tried-and-true method to determine the right number of iterations or the right batch size; we determine them experimentally through trial and error as we observe the training progress.

That said, there are important practical constraints to these numbers: each mini-batch must be small enough to fit inside the processing memory (typical batch sizes people use are powers of 2: 32, 64, 128, 256, and 512). The number of iterations also has a practical constraint: the more iterations we have, the longer the training process takes. With complex deep learning models like GANs, this can get out of hand quickly, even with significant computing power.

To determine the right number of iterations, we monitor the training loss and set the iteration number around the point when the loss plateaus, indicating that we are getting little to no incremental improvement from further training. (Because this is a generative model, overfitting is as much a concern as it is for supervised learning algorithms.)

Listing 3.8. Running the model

iterations = 20000                                1
batch_size = 128
sample_interval = 1000

train(iterations, batch_size, sample_interval)    2

1 Sets hyperparameters
2 Trains the GAN for the specified number of iterations

3.4.8. Inspecting the results

Figure 3.6 shows example images produced by the Generator over the course of training iterations, from earliest to latest.

Figure 3.6. Starting from what looks to be no more than random noise, the Generator gradually learns to emulate the features of the training dataset: in our case, images of handwritten digits.

As you can see, the Generator starts out by producing little more than random noise. Over the course of the training iterations, it gets better and better at emulating the features of the training data. Each time the Discriminator rejects a generated image as false or accepts one as real, the Generator improves a little. Figure 3.7 shows examples of images the Generator can synthesize after it is fully trained.

Figure 3.7. Although far from perfect, our simple two-layer Generator learned to produce realistic-looking numerals, such as 9 and 1.

For comparison, figure 3.8 shows a randomly selected sample of real images from the MNIST dataset.

Figure 3.8. Example of real handwritten digits from the MNIST dataset used to train our GAN. Although the Generator made impressive progress toward emulating the training data, the difference between the numerals it produces and the real, human-written numerals remains clear.

3.5. Conclusion

Although the images our GAN generated are far from perfect, many of them are easily recognizable as real numerals—an impressive achievement, given that we used only a simple two-layer network architecture for both the Generator and the Discriminator. In the following chapter, you will learn how to improve the quality of the generated images by using a more complex and powerful neural network architecture for the Generator and Discriminator: convolutional neural networks.

Summary

GANs consist of two networks: the Generator (G) and the Discriminator (D), each with its own loss function: J^(G)(θ^(G), θ^(D)) and J^(D)(θ^(G), θ^(D)), respectively.
During training, the Generator and the Discriminator can tune only their own parameters: θ^(G) and θ^(D), respectively.
The two GAN networks are trained simultaneously via a game-like dynamic. The Generator seeks to maximize the Discriminator’s false-positive classifications (classifying a generated image as real), while the Discriminator seeks to minimize its false-positive and false-negative classifications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3. Your first GAN: Generating handwritten digits

Create new playlist

Sign In

Sign Up

Chapter 3. Your first GAN: Generating handwritten digits

3.1. Foundations of GANs: Adversarial training

Figure 3.1. In this GAN architecture diagram, both the Generator and the Discriminator are trained using the Discriminator’s loss. The Discriminator strives to minimize the loss; the Generator seeks to maximize the loss for the fake examples it produces.

3.1.1. Cost functions

3.1.2. Training process

Figure 3.2. The bowl-shaped mesh represents the loss J in the parameter space θ1 and θ2. The black dotted line illustrates the minimization of the loss in the parameter space through optimization.

3.2. The Generator and the Discriminator

3.2.1. Conflicting objectives

3.2.2. Confusion matrix

Table 3.1. Confusion matrix of Discriminator outcomes

3.3. GAN training algorithm

3.4. Tutorial: Generating handwritten digits

3.4.1. Importing modules and specifying model input dimensions

Listing 3.1. Import statements

Listing 3.2. Model input dimensions

3.4.2. Implementing the Generator

Listing 3.3. Generator

3.4.3. Implementing the Discriminator

Listing 3.4. Discriminator

3.4.4. Building the model

Listing 3.5. Building and compiling the GAN

3.4.5. Training

Listing 3.6. GAN training loop

3.4.6. Outputting sample images

Listing 3.7. Displaying generated images

3.4.7. Running the model

Listing 3.8. Running the model

3.4.8. Inspecting the results

Figure 3.6. Starting from what looks to be no more than random noise, the Generator gradually learns to emulate the features of the training dataset: in our case, images of handwritten digits.

Figure 3.7. Although far from perfect, our simple two-layer Generator learned to produce realistic-looking numerals, such as 9 and 1.

Figure 3.8. Example of real handwritten digits from the MNIST dataset used to train our GAN. Although the Generator made impressive progress toward emulating the training data, the difference between the numerals it produces and the real, human-written numerals remains clear.

3.5. Conclusion

Summary

Table of Contents for
Chapter 3. Your first GAN: Generating handwritten digits

Figure 3.2. The bowl-shaped mesh represents the loss J in the parameter space θ₁ and θ₂. The black dotted line illustrates the minimization of the loss in the parameter space through optimization.