Convolutional neural networks

Sight is hands-down the most-used sub-process. You are using it right now! Of course, it was something researchers attempted to mimic with neural networks early on, except that nothing really worked well until the concept of convolution was applied and used to classify images. The concept of convolution is the idea behind detecting, sometimes grouping, and isolating common features in an image. For instance, if you cover up 3/4 of a picture of a familiar object and show it to someone, they will almost certainly recognize the image by recognizing just the partial features. Convolution works the same way, by blowing up an image and then isolating the features for later recognition.

Convolution works by dissecting an image into its feature parts, which makes it easier to train a network. Let's jump into a code sample that extends from where we left off in the previous chapter but that now introduces convolution. Open up the listing and follow these steps:

  1. Take a look at the first couple of lines doing the import:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
  1. In this example, we import new layer types: Conv2D, MaxPooling2D, and UpSampling2D
  2. Then we set the Input and build up the encoded and decoded network sections using the following code:
input_img = Input(shape=(28, 28, 1)) # adapt this if using `channels_first` image data format

x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
  1. The first thing to note is that we are now preserving the dimensions of the image, in this case, 28 x 28 pixels wide and 1 layer or channel. This example uses an image that is in grayscale, so there is only a single color channel. This is vastly different from before, when we just unraveled the image into a single 784-dimension vector.

The second thing to note is the use of the Conv2D layer or two-dimensional convolutional layer and the following MaxPooling2D or UpSampling2D layers. Pooling or sampling layers are used to gather or conversely unravel features. Note how we use pooling or down-sampling layers after convolution when the image is encoded and then up-sampling layers when decoding the image.

  1. Next, we build and train the model with the following block of code:
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

from tensorflow.keras.datasets import mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

from tensorflow.keras.callbacks import TensorBoard, x_train,
validation_data=(x_test, x_test),

decoded_imgs = autoencoder.predict(x_test)
  1. The training of the model in the preceding code mirrors what we did at the end of the previous chapter, but note the selection of training and testing sets now. We no longer squish the image but rather preserve its spatial properties as inputs into the convolutional layer.
  2. Finally, we output the results with the following code:
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
ax = plt.subplot(2, n, i)
plt.imshow(x_test[i].reshape(28, 28))
ax = plt.subplot(2, n, i + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
  1. Run the code, as you have before, and you'll immediately notice that it is about 100 times slower to train. This may or may not require you to wait, depending on your machine; if it does, go get a beverage or three and perhaps a meal. 

Training our simple sample now takes a large amount of time, which may be quite noticeable on older hardware. In the next section, we look at how we can start to monitor the training sessions, in great detail.

