Image generation with adversarial networks

Generative Adversarial Networks (GANs) are a new, trendy type of network. Their main attraction is the Generative side. This means that we can train a network to generate a new sample of data that is similar to a reference.

A few years ago, researchers used Deep Belief Networks (DBN) for this task, consisting of a visible layer and then a set of internal layers that ended up recurrent. Training such networks was quite difficult, so people thought about new architectures.

Enter our GAN. How can we train a network to generate samples that are similar to a reference? First, we need to design a generator network. Typically, we need a set of random variables that will be fed inside a set of dense and conv2d_transpose layers. The latter do the opposite of the conv2d layer, going from an input that looks like a convoluted output to an output that looks like a convolution input.

Now, to train this network, we use the Adversarial part. The trick is to train another network, a discriminator, to detect whether a sample is a real sample or a generated sample. One iteration will train the discriminator to enhance its discrimination power, and the iteration after will train the generator to get an image closer to a real one.

Let's try to generate realistic handwritten digits; we will reuse parts of our previous CNN classifier. We need to add a generator there and change the layers to take into account the additional random input that will generate our images.

Let's start with some helper functions: a match helper for our cost function as well as a function to write our newly generated samples to disk and display then during our training session. We also create our own layer for batch normalization to simplify the underlying computation:

import tensorflow as tf
import numpy as np

def match(logits, labels):
    logits = tf.clip_by_value(logits, 1e-7, 1. - 1e-7)
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
        logits=logits, labels=labels))

We use the non-sparse version of the cost function helper we used before, softmax_cross_entropy_with_logits_v2. We could have used the sparse version as well, but at the cost of additional code. As there is only one output value, this is simple to handle.

Let's explain the batch normalize layer. We compute the mean and the standard deviation of the input tensor. Then, we normalize this matrix (this way, the matrix has a mean of 0 and a standard deviation of 1). We have an explicit handling of 2D and 4D matrices because we need to handle the difference in the axis explicitly:

def batchnormalize(X, eps=1e-8, g=None, b=None):
    if X.get_shape().ndims == 4:
        mean = tf.reduce_mean(X, [0,1,2])
        std = tf.reduce_mean( tf.square(X-mean), [0,1,2] )
        X = (X-mean) / tf.sqrt(std+eps)

        if g is not None and b is not None:
            g = tf.reshape(g, [1,1,1,-1])
            b = tf.reshape(b, [1,1,1,-1])
            X = X*g + b

    elif X.get_shape().ndims == 2:
        mean = tf.reduce_mean(X, 0)
        std = tf.reduce_mean(tf.square(X-mean), 0)
        X = (X-mean) / tf.sqrt(std+eps)

        if g is not None and b is not None:
            g = tf.reshape(g, [1,-1])
            b = tf.reshape(b, [1,-1])
            X = X*g + b

    else:
        raise NotImplementedError

    return X

def save_visualization(X, nh_nw, save_path='./sample.jpg'):
    from imageio import imwrite
    from matplotlib import pyplot as plt
    h,w = X.shape[1], X.shape[2]
    img = np.zeros((h * nh_nw[0], w * nh_nw[1], 3))

    for n,x in enumerate(X):
        j = n // nh_nw[1]
        i = n % nh_nw[1]
        img[j*h:j*h+h, i*w:i*w+w, :] = x

    img = img.astype(np.uint8)
    imwrite(save_path, img)
    plt.imshow(img)
    plt.show()

As we are using the sigmoid mapping function for probabilities, we need to remove values 0 and 1 from the mapping (as they map from infinity). We do that by adding 1e-7 to 0 and subtracting it from 1.

We can now create our class, creating our model. The constructor will have additional new parameters, the number of classes Y, as well as the size of a random state Z:

class DCGAN():
    def __init__(
            self,
            image_shape=[28,28,1],
            dim_z=100,
            dim_y=10,
            dim_W1=1024,
            dim_W2=128,
            dim_W3=64,
            dim_channel=1,
            ):

        self.image_shape = image_shape
        self.dim_z = dim_z
        self.dim_y = dim_y

        self.dim_W1 = dim_W1
        self.dim_W2 = dim_W2
        self.dim_W3 = dim_W3
        self.dim_channel = dim_channel

This is now our new special layer for the generator that will create good images—the convolution transpose layer we talked about earlier:

def create_conv2d_transpose(self, input, filters, kernel_size, name, with_batch_norm):
        layer = tf.layers.conv2d_transpose(
                    inputs=input,
                    filters=filters,
                    kernel_size=kernel_size,
                    strides=[2,2],
                    name="Conv2d_transpose_" + name,
                    padding="SAME")
        if with_batch_norm:
            layer = batchnormalize(layer)
            layer = tf.nn.relu(layer, name="RELU_" + name)
        return layer

Our discriminator should return a probability between 0 and 1 of being a real image. To accomplish this and to allow the generator to create various types of images, we drive into the discriminator the image, and also their classes on each layer:

def discriminate(self, image, Y, reuse=False):
        with tf.variable_scope('discriminate', reuse=reuse):
            Y = tf.one_hot(Y, dim_y)
            yb = tf.reshape(Y, tf.stack([-1, 1, 1, self.dim_y]))
            image = tf.concat(axis=3, values=
                [image, yb*tf.ones([1, 28, 28, self.dim_y])])
    
            h1 = self.create_conv2d(image, self.dim_W3, 5, "Lay-er1", True)
            h1 = tf.concat(axis=3, values=
                [h1, yb*tf.ones([1, 14, 14, self.dim_y])])
    
            h2 = self.create_conv2d(h1, self.dim_W2, 5, "Layer2", True)
            h2 = tf.reshape(h2, tf.stack([-1, 7*7*128]))
            h2 = tf.concat(axis=1, values=[h2, Y])
    
            h3 = self.create_dense(h2, self.dim_W1, "Layer3", True)
            h3 = tf.concat(axis=1, values=[h3, Y])
            
            h4 = self.create_dense(h3, 1, "Layer4", True)
            return h4

As we said before, the generator does the opposite, going from our class variables and random state to the final generated image with values between 0 and 1:

def generate(self, Z, Y, reuse=False):
        with tf.variable_scope('generate', reuse=reuse):

            Y = tf.one_hot(Y, dim_y)
            yb = tf.reshape(Y, tf.stack([-1, 1, 1, self.dim_y]))
            Z = tf.concat(axis=1, values=[Z,Y])
            h1 = self.create_dense(Z, self.dim_W1, "Layer1", False)
            h1 = tf.concat(axis=1, values=[h1, Y])
            h2 = self.create_dense(h1, self.dim_W2*7*7, "Layer2", False)
            h2 = tf.reshape(h2, tf.stack([-1,7,7,self.dim_W2]))
            h2 = tf.concat(axis=3, values=
                [h2, yb*tf.ones([1, 7, 7, self.dim_y])])

            h3 = self.create_conv2d_transpose(h2, self.dim_W3, 5, "Layer3", True)
            h3 = tf.concat(axis=3, values=
                [h3, yb*tf.ones([1, 14,14,self.dim_y])] )

            h4 = self.create_conv2d_transpose(
                h3, self.dim_channel, 7, "Layer4", False)
            x = tf.nn.sigmoid(h4)
            return x

It is now time to assemble the pieces. We create placeholders for the generators, as well as for the real images inputs. We then create our image generator (which we will use to show our generated images) and then our discriminators. Here lies the trick. We create two of them at the same time. One is used for the real images and should return 1; the other will be fed the generated images and should return 0. As the two share the same weights, we pass the reuse flag for the second discriminator.

The cost we try to optimize for the discriminator step is the sum of the discrepancies of the two discriminators and for the generator. As we said, we optimize the generator to get a 1 on the discriminator:

def build_model(self):
    Z = tf.placeholder(tf.float32, [None, self.dim_z])
    Y = tf.placeholder(tf.int64, [None])

    image_real = tf.placeholder(tf.float32, [None]+self.image_shape)
    image_gen = self.generate(Z, Y)

    raw_real = self.discriminate(image_real, Y, False)
    raw_gen = self.discriminate(image_gen, Y, True)

    discrim_cost_real = match(raw_real, tf.ones_like(raw_real))
    discrim_cost_gen = match(raw_gen, tf.zeros_like(raw_gen))
    discrim_cost = discrim_cost_real + discrim_cost_gen

    gen_cost = match( raw_gen, tf.ones_like(raw_gen) )

    return Z, Y, is_training, image_real, image_gen, dis-crim_cost, gen_cost

We can now start setting up the graph with the hyperparameters:

n_epochs = 10
learning_rate = 0.0002
batch_size = 1024
image_shape = [28,28,1]
dim_z = 10
dim_y = 10
dim_W1 = 1024
dim_W2 = 128
dim_W3 = 64
dim_channel = 1

visualize_dim=196
step = 200

As before, we read the MNIST dataset:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist.data.shape = (-1, 28, 28)
mnist.data = mnist.data.astype(np.float32).reshape( [-1, 28, 28, 1]) / 255.
mnist.num_examples = len(mnist.data)

And we create our graph. We split the variables into two lists (as they have a prefix), with an Adam optimizer for each of them. We also create the variables for our samples to check if whether our generator has started generating recognizable images:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist.data.shape = (-1, 28, 28)
mnist.data = mnist.data.astype(np.float32).reshape( [-1, 28, 28, 1]) / 255.
mnist.num_examples = len(mnist.data)
tf.reset_default_graph()
dcgan_model = DCGAN(
        image_shape=image_shape,
        dim_z=dim_z,
        dim_W1=dim_W1,
        dim_W2=dim_W2,
        dim_W3=dim_W3,
        )
Z_tf, Y_tf, iimage_tf, image_tf_sample, d_cost_tf, g_cost_tf, = dcgan_model.build_model()
discrim_vars = list(filter(lambda x: x.name.startswith('discriminate'),
    tf.trainable_variables()))
gen_vars = list(filter(lambda x: x.name.startswith('generate'), tf.trainable_variables()))

train_op_discrim = tf.train.AdamOptimizer(learning_rate, beta1=0.5).minimize(
    d_cost_tf, var_list=discrim_vars)
train_op_gen = tf.train.AdamOptimizer(learning_rate, beta1=0.5).minimize(
    g_cost_tf, var_list=gen_vars)

Z_np_sample = np.random.uniform(-1, 1, size=(visualize_dim,dim_z))
Y_np_sample = np.random.randint(10, size=[visualize_dim])

In our session, we first transform the labels to one-hot encoding with Tensorflow and then we use the same pattern as for our previous networks. We generate random numbers for the generated images for each batch, and we optimize the discriminator and then the generator, as we planned:

with tf.Session() as sess:
    mnist.target = tf.one_hot(mnist.target.astype(np.int8), dim_y).eval()
    Y_np_sample = tf.one_hot(Y_np_sample, dim_y).eval()
    
    sess.run(tf.global_variables_initializer())
    for epoch in range(n_epochs):
        permut = np.random.permutation(mnist.num_examples)
        trX = mnist.data[permut]
        trY = mnist.target[permut]
        Z = np.random.uniform(-1, 1, size=[mnist.num_examples, dim_z]).astype(np.float32)

        print("epoch: %i" % epoch)
        for j in range(0, mnist.num_examples, batch_size):
            if j % step == 0:
                print(" batch: %i" % j)

            batch = permut[j:j+batch_size]

            Xs = trX[batch]
            Ys = trY[batch]
            Zs = Z[batch]

            sess.run(train_op_discrim,
                    feed_dict={
                        Z_tf:Zs,
                        Y_tf:Ys,
                        image_tf:Xs,
                        })

            sess.run(train_op_gen,
                    feed_dict={
                        Z_tf:Zs,
                        Y_tf:Ys,
                        })

            if j % step == 0:
                generated_samples = sess.run(
                        image_tf_sample,
                        feed_dict={
                            Z_tf:Z_np_sample,
                            Y_tf:Y_np_sample,
                            })
                generated_samples = generated_samples * 255
                save_visualization(generated_samples, (7,28),
                    save_path='./sample_%03d_%04d.jpg' %
                        (epoch, j / step))
epoch: 0
  batch: 0

…
epoch: 3
  batch: 0

epoch: 9
 batch: 64000

We can see quite early shapes that resemble digits. The way they evolve is very interesting. They go from smooth to crispy to noisy (for a high number of epochs). This is understandable because there is no convergence for these networks. As they are adversarial, each time one learns a trick that works on the other, the other will counter it. For instance, if the images are not smooth enough, the discriminator may discriminate on this difference, and if the generator generates smooth images, then the discriminator will move on to other differences. The problem is that the generator will forget about past known tricks, so there is no way to stop at a meaningful point!

Table of Contents for Image generation with adversarial networks

Create new playlist

Sign In

Sign Up

Table of Contents for
Image generation with adversarial networks