Chapter 4. Deep Convolutional GAN

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Deep Convolutional GAN

This chapter covers

Understanding key concepts behind convolutional neural networks
Using batch normalization
Implementing Deep Convolutional GAN, an advanced GAN architecture

In the previous chapter, we implemented a GAN whose Generator and Discriminator were simple feed-forward neural networks with a single hidden layer. Despite this simplicity, many of the images of handwritten digits that the GAN’s Generator produced after being fully trained were remarkably convincing. Even the ones that were not recognizable as human-written numerals had many of the hallmarks of handwritten symbols, such as discernible line edges and shapes—especially when compared to the random noise used as the Generator’s raw input.

Imagine what we could accomplish with more powerful network architecture. In this chapter, we will do just that: instead of simple two-layer feed-forward networks, both our Generator and Discriminator will be implemented as convolutional neural networks (CNNs, or ConvNets). The resulting GAN architecture is known as Deep Convolutional GAN, or DCGAN for short.

Before delving into the nitty-gritty of the DCGAN implementation, we will review the key concepts underlying ConvNets, review the history behind the discovery of the DCGAN, and cover one of the key breakthroughs that made complex architectures like DCGAN possible in practice: batch normalization.

4.1. Convolutional neural networks

We expect that you’ve already been exposed to convolutional networks; that said, if this technique is new to you, don’t worry. In this section, we review all the key concepts you need for this chapter and the rest of this book.

4.1.1. Convolutional filters

Unlike a regular feed-forward neural network whose neurons are arranged in flat, fully connected layers, layers in a ConvNet are arranged in three dimensions (width × height × depth). Convolutions are performed by sliding one or more filters over the input layer. Each filter has a relatively small receptive field (width × height) but always extends through the entire depth of the input volume.

At every step as it slides across the input, each filter outputs a single activation value: the dot product between the input values and the filter entries. This process results in a two-dimensional activation map for each filter. The activation maps produced by each filter are then stacked on top of one another to produce a three-dimensional output layer; the output depth is equal to the number of filters used.

4.1.2. Parameter sharing

Importantly, filter parameters are shared by all the input values to the given filter. This has both intuitive and practical advantages. Intuitively, parameter sharing allows us to efficiently learn visual features and shapes (such as lines and edges) regardless of where they are located in the input image. From a practical perspective, parameter sharing drastically reduces the number of trainable parameters. This decreases the risk of overfitting and allows this technique to scale up to higher-resolution images without a corresponding exponential increase in trainable parameters, as would be the case with a traditional, fully connected network.

4.1.3. ConvNets visualized

If all this sounds confusing, let’s make these concepts a little less abstract by visualizing them. Diagrams make everything easier to understand for most people (us included!). Figure 4.1 shows a single convolution operation; figure 4.2 illustrates the convolution operation in the context of the input and output layers in a ConvNet.

Figure 4.1. A 3 × 3 convolutional filter as it slides over a 5 × 5 input—left to right, top to bottom. At each step, the filter moves by two strides; accordingly, it makes a total of four steps, resulting in a 2 × 2 activation map. Notice how at each step, the entire filter produces a single activation value.

(Source: “A Guide to Convolution Arithmetic for Deep Learning,” by Vincent Dumoulin and Francesco Visin, 2016, https://arxiv.org/abs/1603.07285.)

Figure 4.1 depicts the convolution operation for a single filter over a two-dimensional input. In practice, the input volume is usually three-dimensional, and we use several stacked filters. The underlying mechanics, however, remain the same: each filter produces a single value per step, regardless of the depth of the input volume. The number of filters we use determines the depth of the output volume, as their resulting activation maps are stacked on top of one another. All this is illustrated in figure 4.2.

Figure 4.2. An activation value for a single convolutional step within the context of the activation map (feature map) and the input and output volumes. Notice that the ConvNet filter extends through the full depth of the input volume and that the depth of the output volume is determined by stacking together activation maps.

(Source: “Convolutional Neural Network,” by Nameer Hirschkind et al., Brilliant.org, retrieved November 1, 2018, http://mng.bz/8zJK.)

Note

If you would like to dive deeper into convolutional networks and the underlying concepts, we recommend reading the relevant chapters in François Chollet’s Deep Learning with Python (Manning, 2017), which provides an outstanding, hands-on introduction to all the key concepts and techniques in deep learning, including ConvNets. For those with a more academic bent, a great resource is Andrej Karpathy’s excellent lecture notes from his Stanford University class on Convolutional Neural Networks for Visual Recognition (http://cs231n.github.io/convolutional-networks/).

4.2. Brief history of the DCGAN

Introduced in 2016 by Alec Radford, Luke Metz, and Soumith Chintala, DCGAN marked one of the most important early innovations in GANs since the technique’s inception two years earlier.^[1] This was not the first time a group of researchers tried harnessing ConvNets for use in GANs, but it was the first time they succeeded at incorporating ConvNets directly into a full-scale GAN model.

¹

See “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” by Alec Radford et al., 2015, https://arxiv.org/abs/1511.06434.

The use of ConvNets exacerbates many of the difficulties plaguing GAN training, including instability and gradient saturation. Indeed, these challenges proved so daunting that some researchers resorted to alternative approaches, such as the LAPGAN, which uses a cascade of convolutional networks within a Laplacian pyramid, with a separate ConvNet being trained at each level using the GAN framework.^[2] If none of this makes sense to you, don’t worry. Superseded by superior methods, LAPGAN has been largely relegated to the dustbin of history, so it is not important to understand its internals.

²

See “Deep Generative Image Models Using a Laplacian Pyramid of Adversarial Networks,” by Emily Denton et al., 2015, https://arxiv.org/abs/1506.05751.

Although inelegant, complex, and computationally taxing, LAPGAN yielded the highest-quality images to date at the time of its publication, with fourfold improvement over the original GAN (40% versus 10% of generated images mistaken for real by human evaluators). As such, LAPGAN demonstrated the enormous potential of marrying GANs with ConvNets.

With DCGAN, Radford and his collaborators introduced techniques and optimizations that allowed ConvNets to scale up to the full GAN framework without the need to modify the underlying GAN architecture and without reducing GAN to a subroutine of a more complex model framework, like LAPGAN. One of the key techniques Radford et al. used is batch normalization, which helps stabilize the training process by normalizing inputs at each layer where it is applied. Let’s take a closer look at what batch normalization is and how it works.

4.3. Batch normalization

Batch normalization was introduced by Google scientists Sergey Ioffe and Christian Szegedy in 2015.^[3] Their insight was as simple as it was groundbreaking. Just as we normalize network inputs, they proposed to normalize the inputs to each layer, for each training mini-batch as it flows through the network.

³

See “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” by Sergey Ioffe and Christian Szegedy, 2015, https://arxiv.org/abs/1502.03167.

4.3.1. Understanding normalization

It helps to remind ourselves what normalization is and why we bother normalizing the input feature values in the first place. Normalization is the scaling of data so that it has zero mean and unit variance. This is accomplished by taking each data point x, subtracting the mean μ, and dividing the result by the standard deviation, σ, as shown in equation 4.1:

equation 4.1.

Normalization has several advantages. Perhaps most important, it makes comparisons between features with vastly different scales easier and, by extension, makes the training process less sensitive to the scale of the features. Consider the following (rather contrived) example. Imagine we are trying to predict the monthly expenditures of a family based on two features: the family’s annual income and the family size. We would expect that, in general, the more a family earns, the more they spend; and the bigger a family is, the more they spend.

However, the scales of these features are vastly different—an extra $10 in annual income probably wouldn’t influence how much a family spends, but an additional 10 members would likely wreak havoc on any family’s budget. Normalization solves this problem by scaling each feature value onto a standardized scale, such that each data point is expressed not as its face value but as a relative “score” indicating how many standard deviations the given data point is from the mean.

The insight behind batch normalization is that normalizing inputs alone may not go far enough when dealing with deep neural networks with many layers. As the input values flow through the network, from one layer to the next, they are scaled by the trainable parameters in each of those layers. And as the parameters get tuned by backpropagation, the distribution of each layer’s inputs is prone to change in subsequent training iterations, which destabilizes the learning process. In academia, this problem is known as covariate shift. Batch normalization solves it by scaling values in each mini-batch by the mean and variance of that mini-batch.

4.3.2. Computing batch normalization

The way batch normalization is computed differs in several respects from the simple normalization equation we presented earlier. This section walks through it step by step.

Let μ_B be the mean of the mini-batch B, and σ_B² be the variance (mean squared deviation) of the mini-batch B. The normalized value is computed as shown in equation 4.2:

equation 4.2.

The term ϵ (epsilon) is added for numerical stability, primarily to avoid division by zero. It is set to a small positive constant value, such as 0.001.

In batch normalization, we do not use these normalized values directly. Instead, we multiply them by γ (gamma) and add β (beta) before passing them as inputs to the next layer; see equation 4.3.

equation 4.3.

Importantly, the terms γ and β are trainable parameters, which—just like weights and biases—are tuned during network training. The reason for this is that it may be beneficial for the intermediate input values to be standardized around a mean other than 0 and have a variance other than 1. Because γ and β are trainable, the network can learn what values work best.

Fortunately for us, we don’t have to worry about any of this. The Keras function keras.layers.BatchNormalization handles all the mini-batch computations and updates behind the scenes for us.

Batch normalization limits the amount by which updating the parameters in the previous layers can affect the distribution of inputs received by the current layer. This decreases any unwanted interdependence between parameters across layers, which helps speed up the network training process and increase its robustness, especially when it comes to network parameter initialization.

Batch normalization has proven essential to the viability of many deep learning architectures, including the DCGAN, which you will see in action in the following tutorial.

4.4. Tutorial: Generating handwritten digits with DCGAN

In this tutorial, we will revisit the MNIST dataset of handwritten digits from chapter 3. This time, however, we will use the DCGAN architecture and represent both the Generator and the Discriminator as convolutional networks, as shown in figure 4.3. Besides this change, the rest of the network architecture remains unchanged. At the end of the tutorial, we will compare the quality of the handwritten numerals produced by the two GANs (traditional versus DCGAN) so you can see the improvement made possible by the use of a more advanced network architecture.

Figure 4.3. The overall model architecture for this chapter’s tutorial is the same as the GAN we implemented in chapter 3. The only differences (not visible on this high-level diagram) are the internal representations of the Generator and Discriminator networks (the insides of the Generator and Discriminator boxes). These networks are covered in detail later in this tutorial.

As in chapter 3, much of the code in this tutorial was adapted from Erik Linder-Norén’s open source GitHub repository of GAN models in Keras (https://github.com/eriklindernoren/Keras-GAN), with numerous modifications and improvements spanning both the implementation details and network architectures. A Jupyter notebook with the full implementation, including added visualizations of the training progress, is available in the GitHub repository for this book at https://github.com/GANs-in-Action/gans-in-action, under the chapter-4 folder. The code was tested with Python 3.6.0, Keras 2.1.6, and TensorFlow 1.8.0. To speed up the training time, it is recommended to run the model on a GPU.

4.4.1. Importing modules and specifying model input dimensions

First, we import all the packages, modules, and libraries we need to train and run the model. Just as in chapter 3, the MNIST dataset of handwritten digits is imported directly from keras.datasets.

Listing 4.1. Import statements

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import (
    Activation, BatchNormalization, Dense, Dropout, Flatten, Reshape)
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.models import Sequential
from keras.optimizers import Adam

We also specify the model input dimensions: the image shape and the length of the noise vector z.

Listing 4.2. Model input dimensions

img_rows = 28
img_cols = 28
channels = 1

img_shape = (img_rows, img_cols, channels)    1

z_dim = 100                                   2

1 Input image dimensions
2 Size of the noise vector, used as input to the Generator

4.4.2. Implementing the Generator

ConvNets have traditionally been used for image classification tasks, in which the network takes in an image with the dimensions height × width × number of color channels as input and—through a series of convolutional layers—outputs a single vector of class scores, with the dimensions 1 × n, where n is the number of class labels. To generate an image by using the ConvNet architecture, we reverse the process: instead of taking an image and processing it into a vector, we take a vector and up-size it to an image.

Key to this process is the transposed convolution. Recall that regular convolution is typically used to reduce input width and height while increasing its depth. Transposed convolution goes in the reverse direction: it is used to increase the width and height while reducing depth, as you can see in the Generator network diagram in figure 4.4.

Figure 4.4. The Generator takes in a random noise vector as input and produces a 28 × 28 × 1 image. It does so by multiple layers of transposed convolutions. Between the convolutional layers, we apply batch normalization to stabilize the training process. (Image is not to scale.)

The Generator starts with a noise vector z. Using a fully connected layer, we reshape the vector into a three-dimensional hidden layer with a small base (width × height) and large depth. Using transposed convolutions, the input is progressively reshaped such that its base grows while its depth decreases until we reach the final layer with the shape of the image we are seeking to synthesize, 28 × 28 × 1. After each transposed convolution layer, we apply batch normalization and the Leaky ReLU activation function. At the final layer, we do not apply batch normalization and, instead of ReLU, we use the tanh activation function.

Putting all the steps together, we do the following:

Take a random noise vector and reshape it into a 7 × 7 × 256 tensor through a fully connected layer.
Use transposed convolution, transforming the 7 × 7 × 256 tensor into a 14 × 14 × 128 tensor.
Apply batch normalization and the Leaky ReLU activation function.
Use transposed convolution, transforming the 14 × 14 × 128 tensor into a 14 × 14 × 64 tensor. Notice that the width and height dimensions remain unchanged; this is accomplished by setting the stride parameter in Conv2DTranspose to 1.
Apply batch normalization and the Leaky ReLU activation function.
Use transposed convolution, transforming the 14 × 14 × 64 tensor into the output image size, 28 × 28 × 1.
Apply the tanh activation function.

The following listing shows what the Generator network looks like when implemented in Keras.

Listing 4.3. DCGAN Generator

def build_generator(z_dim):

    model = Sequential()

    model.add(Dense(256 * 7 * 7, input_dim=z_dim))                           1
    model.add(Reshape((7, 7, 256)))

    model.add(Conv2DTranspose(128, kernel_size=3, strides=2, padding='same'))2

    model.add(BatchNormalization())                                          3

    model.add(LeakyReLU(alpha=0.01))                                         4

    model.add(Conv2DTranspose(64, kernel_size=3, strides=1, padding='same')) 5

    model.add(BatchNormalization())                                          3

    model.add(LeakyReLU(alpha=0.01))                                         4

    model.add(Conv2DTranspose(1, kernel_size=3, strides=2, padding='same'))  6

    model.add(Activation('tanh'))                                            7

    return model

1 Reshapes input into 7 × 7 × 256 tensor via a fully connected layer
2 Transposed convolution layer, from 7 × 7 × 256 into 14 × 14 × 128 tensor
2 Batch normalization
4 Leaky ReLU activation
5 Transposed convolution layer, from 14 × 14 × 128 to 14 × 14 × 64 tensor
6 Transposed convolution layer, from 14 × 14 × 64 to 28 × 28 × 1 tensor
7 Output layer with tanh activation

4.4.3. Implementing the Discriminator

The Discriminator is a ConvNet of the familiar kind, one that takes in an image and outputs a prediction vector: in this case, a binary classification indicating whether the input image was deemed to be real rather than fake. Figure 4.5 depicts the Discriminator network we will implement.

Figure 4.5. The Discriminator takes in a 28 × 28 × 1 image as input, applies several convolutional layers, and—using the sigmoid activation function σ—outputs a probability that the input image is real rather than fake. Between the convolutional layers, we apply batch normalization to stabilize the training process. (Image is not to scale.)

The input to the Discriminator is a 28 × 28 × 1 image. By applying convolutions, the image is transformed such that its base (width × height) gets progressively smaller and its depth gets progressively deeper. On all convolutional layers, we apply the Leaky ReLU activation function. Batch normalization is used on all convolutional layers except the first. For output, we use a fully connected layer and the sigmoid activation function.

Putting all the steps together, we do the following:

Use a convolutional layer to transform a 28 × 28 × 1 input image into a 14 × 14 × 32 tensor.
Apply the Leaky ReLU activation function.
Use a convolutional layer, transforming the 14 × 14 × 32 tensor into a 7 × 7 × 64 tensor.
Apply batch normalization and the Leaky ReLU activation function.
Use a convolutional layer, transforming the 7 × 7 × 64 tensor into a 3 × 3 × 128 tensor.
Apply batch normalization and the Leaky ReLU activation function.
Flatten the 3 × 3 × 128 tensor into a vector of size 3 × 3 × 128 = 1152.
Use a fully connected layer feeding into the sigmoid activation function to compute the probability of whether the input image is real.

The following listing is a Keras implementation of the Discriminator model.

Listing 4.4. DCGAN Discriminator

def build_discriminator(img_shape):

    model = Sequential()

    model.add(                                  1
        Conv2D(32,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(LeakyReLU(alpha=0.01))            2

    model.add(                                  3
        Conv2D(64,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(BatchNormalization())             4

    model.add(LeakyReLU(alpha=0.01))            5

    model.add(                                  6
        Conv2D(128,
               kernel_size=3,
               strides=2,
               input_shape=img_shape,
               padding='same'))

    model.add(BatchNormalization())             7

    model.add(LeakyReLU(alpha=0.01))            8

    model.add(Flatten())                        9
    model.add(Dense(1, activation='sigmoid'))

    return model

1 Convolutional layer, from 28 × 28 × 1 into 14 × 14 × 32 tensor
2 Leaky ReLU activation
3 Convolutional layer, from 14 × 14 × 32 into 7 × 7 × 64 tensor
4 Batch normalization
5 Leaky ReLU activation
6 Convolutional layer, from 7 × 7 × 64 tensor into 3 × 3 × 128 tensor
7 Batch normalization
8 Leaky ReLU activation
9 Output layer with sigmoid activation

4.4.4. Building and running the DCGAN

Aside from the network architectures used for the Generator and the Discriminator, the rest of the DCGAN network setup and implementation is the same as the one we used for the simple GAN in chapter 3. This underscores the versatility of the GAN architecture. Listing 4.5 code builds the model, and listing 4.6 trains the model.

Listing 4.5. Building and compiling the DCGAN

def build_gan(generator, discriminator):

    model = Sequential()

    model.add(generator)                                      1
    model.add(discriminator)

    return model

discriminator = build_discriminator(img_shape)                2
discriminator.compile(loss='binary_crossentropy',
                      optimizer=Adam(),
                      metrics=['accuracy'])

generator = build_generator(z_dim)                            3

discriminator.trainable = False                               4

gan = build_gan(generator, discriminator)                     5
gan.compile(loss='binary_crossentropy', optimizer=Adam())

1 Combined Generator + Discriminator model
2 Builds and compiles the Discriminator
2 Builds the Generator
4 Keeps Discriminator’s parameters constant for Generator training
5 Builds and compiles GAN model with fixed Discriminator to train the Generator

Listing 4.6. DCGAN training loop

losses = []
accuracies = []
iteration_checkpoints = []


def train(iterations, batch_size, sample_interval):

    (X_train, _), (_, _) = mnist.load_data()                              1

    X_train = X_train / 127.5 - 1.0                                       2
    X_train = np.expand_dims(X_train, axis=3)

    real = np.ones((batch_size, 1))                                       3

    fake = np.zeros((batch_size, 1))                                      4

    for iteration in range(iterations):


        idx = np.random.randint(0, X_train.shape[0], batch_size)          5
        imgs = X_train[idx]

        z = np.random.normal(0, 1, (batch_size, 100))                     6
        gen_imgs = generator.predict(z)

        d_loss_real = discriminator.train_on_batch(imgs, real)            7
        d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
        d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)


        z = np.random.normal(0, 1, (batch_size, 100))                     8
        gen_imgs = generator.predict(z)

        g_loss = gan.train_on_batch(z, real)                              9

        if (iteration + 1) % sample_interval == 0:

            losses.append((d_loss, g_loss))                               10
            accuracies.append(100.0 * accuracy)                           10
            iteration_checkpoints.append(iteration + 1)                   10

            print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %          11
                  (iteration + 1, d_loss, 100.0 * accuracy, g_loss))

            sample_images(generator)                                      12

1 Loads the MNIST dataset
2 Rescales [0, 255] grayscale pixel values to [–1, 1]
3 Labels for real images: all 1s
4 Labels for fake images: all 0s
5 Gets a random batch of real images
6 Generates a batch of fake images
7 Trains the Discriminator
8 Generates a batch of fake images
9 Trains the Generator
10 Saves losses and accuracies so they can be plotted after training
11 Outputs training progress
12 Outputs a sample generated image

For completeness, we are also including the sample_images() function in the following listing. Recall from chapter 3 that this function outputs a 4 × 4 grid of images synthesized by the Generator in a given training iteration.

Listing 4.7. Displaying generated images

def sample_images(generator, image_grid_rows=4, image_grid_columns=4):

    z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))1

    gen_imgs = generator.predict(z)                                          2

    gen_imgs = 0.5 * gen_imgs + 0.5                                          3

    fig, axs = plt.subplots(image_grid_rows,                                 4
                            image_grid_columns,
                            figsize=(4, 4),
                            sharey=True,
                            sharex=True)

    cnt = 0
    for i in range(image_grid_rows):
        for j in range(image_grid_columns):
            axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')            5
            axs[i, j].axis('off')
            cnt += 1

1 Sample random noise
2 Generates images from random noise
3 Rescales image pixel values to [0, 1]
4 Sets image grid
5 Outputs a grid of images

Next, the following code is used to run the model.

Listing 4.8. Running the model

iterations = 20000                                 1
batch_size = 128
sample_interval = 1000

train(iterations, batch_size, sample_interval)     2

1 Sets hyperparameters
2 Trains the DCGAN for the specified number of iterations

4.4.5. Model output

Figure 4.6 shows a sample of handwritten digits produced by the Generator after the DCGAN is fully trained. For a side-by-side comparison, figure 4.7 shows a sample of digits produced by the GAN from chapter 3, and figure 4.8 shows a sample of real handwritten numerals from the MNIST dataset.

Figure 4.6. A sample of handwritten digits generated by a fully trained DCGAN

Figure 4.7. A sample of handwritten digits generated by the GAN implemented in chapter 3

Figure 4.8. A randomly generated grid of real handwritten digits from the MNIST dataset used to train our DCGAN. Unlike the images produced by the simple GAN we implemented in chapter 3, many of the handwritten digits produced by the fully trained DCGAN are essentially indistinguishable from the training data.

As evidenced by the preceding figures, all the extra work we put into implementing DCGAN paid off handsomely. Many of the images of handwritten digits that the network produces after being fully trained are virtually indistinguishable from the ones written by a human hand.

4.5. Conclusion

DCGAN demonstrates the versatility of the GAN framework. In theory, the Discriminator and Generator can be represented by any differentiable function, even one as complex as a multilayer convolutional network. However, DCGAN also demonstrates that there are significant hurdles to making more complex implementations work in practice. Without breakthroughs such as batch normalization, DCGAN would fail to train properly.

In the following chapter, we will explore some of the theoretical and practical limitations that make GAN training so challenging as well as the approaches to overcome them.

Summary

Convolutional neural networks (ConvNets) use one or more convolutional filters that slide over the input volume. At each step as it slides over the input, a filter uses a single set of parameters to produce a single activation value. Together, all the activation values from all the filters produce the output layer.
Batch normalization is a method that reduces the covariate shift (variations in input value distributions between layers during training) in neural networks by normalizing the output of each layer before it is passed as input to the next layer.
Deep Convolutional GAN (DCGAN) is a Generative Adversarial Network with convolutional neural networks as its Generator and Discriminator. This architecture achieves superior performance in image-processing tasks, including handwritten digit generation, which we implemented in a code tutorial.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. Deep Convolutional GAN

Create new playlist

Sign In

Sign Up

Chapter 4. Deep Convolutional GAN

4.1. Convolutional neural networks

4.1.1. Convolutional filters

4.1.2. Parameter sharing

4.1.3. ConvNets visualized

Note

4.2. Brief history of the DCGAN

4.3. Batch normalization

4.3.1. Understanding normalization

equation 4.1.

4.3.2. Computing batch normalization

equation 4.2.

equation 4.3.

4.4. Tutorial: Generating handwritten digits with DCGAN

4.4.1. Importing modules and specifying model input dimensions

Listing 4.1. Import statements

Listing 4.2. Model input dimensions

4.4.2. Implementing the Generator

Figure 4.4. The Generator takes in a random noise vector as input and produces a 28 × 28 × 1 image. It does so by multiple layers of transposed convolutions. Between the convolutional layers, we apply batch normalization to stabilize the training process. (Image is not to scale.)

Listing 4.3. DCGAN Generator

4.4.3. Implementing the Discriminator

Listing 4.4. DCGAN Discriminator

4.4.4. Building and running the DCGAN

Listing 4.5. Building and compiling the DCGAN

Listing 4.6. DCGAN training loop

Listing 4.7. Displaying generated images

Listing 4.8. Running the model

4.4.5. Model output

Figure 4.6. A sample of handwritten digits generated by a fully trained DCGAN

Figure 4.7. A sample of handwritten digits generated by the GAN implemented in chapter 3

4.5. Conclusion

Summary

Table of Contents for
Chapter 4. Deep Convolutional GAN