Working with pretrained models

Training large computer vision models is not only hard, but computationally expensive. Therefore, it's common to use models that were originally trained for another purpose and fine-tune them for a new purpose. This is an example of transfer learning.

Transfer learning aims to transfer the learning from one task to another task. As humans, we are very good at transferring what we have learned. When you see a dog that you have not seen before, you don't need to relearn everything about dogs for this particular dog; instead, you just transfer new learning to what you already knew about dogs. It's not economical to retrain a big network every time, as you'll often find that there are parts of the model that we can reuse.

In this section, we will fine-tune VGG-16, originally trained on the ImageNet dataset. The ImageNet competition is an annual computer vision competition, and the ImageNet dataset consists of millions of images of real-world objects, from dogs to planes.

In the ImageNet competition, researchers compete to build the most accurate models. In fact, ImageNet has driven much of the progress in computer vision over the recent years, and the models built for ImageNet competitions are a popular basis to fine-tune models from.

VGG-16 is a model architecture developed by the visual geometry group at Oxford University. The model consists of a convolutional part and a classification part. We will only be using the convolutional part. In addition, we will be adding our own classification part that can classify plants.

VGG-16 can be downloaded via Keras by using the following code:

from keras.applications.vgg16 import VGG16
vgg_model = VGG16(include_top=False,input_shape=(150,150,3))
out: 
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 5s 0us/step

When downloading the data, we want to let Keras know that we don't want to include the top part (the classification part); we also want to let Keras know the desired input shape. If we do not specify the input shape, the model will accept any image size, and it will not be possible to add Dense layers on top:

vgg_model.summary()
out: 
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

As you can see, the VGG model is very large, with over 14.7 million trainable parameters. It also consists of both Conv2D and MaxPooling2D layers, both of which we've already learned about when working on the MNIST dataset.

From this point, there are two different ways we can proceed:

  • Add layers and build a new model.
  • Preprocess all the images through the pertained model and then train a new model.

Modifying VGG-16

In this section, we will be adding layers on top of the VGG-16 model, and then from there, we will train the new, big model.

We do not want to retrain all those convolutional layers that have been trained already, however. So, we must first "freeze" all the layers in VGG-16, which we can do by running the following:

for the layer in vgg_model.layers:
  layer.trainable = False

Keras downloads VGG as a functional API model. We will learn more about the functional API in Chapter 6, Using Generative Models, but for now, we just want to use the Sequential API, which allows us to stack layers through model.add(). We can convert a model with the functional API with the following code:

finetune = Sequential(layers = vgg_model.layers)

As a result of running the code, we have now created a new model called finetune that works just like a normal Sequential model. We need to remember that converting models with the Sequential API only works if the model can actually be expressed in the Sequential API. Some more complex models cannot be converted.

As a result of everything we've just done, adding layers to our model is now simple:

finetune.add(Flatten())
finetune.add(Dense(12))
finetune.add(Activation('softmax'))

The newly added layers are by default trainable, while the reused model socket is not. We can train this stacked model just as we would train any other model, on the data generator we defined in the previous section. This can be executed by running the following code:

finetune.compile(loss='categorical_crossentropy',optimizer='adam',metrics = ['acc'])
                 
finetune.fit_generator(train_generator,epochs=8,steps_per_epoch= 4606 // 32, validation_data=validation_generator, validation_steps= 144//32)

After running this, the model manages to achieve a rate of about 75% validation accuracy.

Random image augmentation

A general problem in machine learning is that no matter how much data we have, having more data will always be better, as it would increase the quality of our output while also preventing overfitting and allowing our model to deal with a larger variety of inputs. It's therefore common to apply random augmentation to images, for example, a rotation or a random crop.

The idea is to get a large number of different images out of one image, therefore reducing the chance that the model will overfit. For most image augmentation purposes, we can just use Keras' ImageDataGenerator.

More advanced augmentations can be done with the OpenCV library. However, focusing on this is outside the scope of this chapter.

Augmentation with ImageDataGenerator

When using an augmenting data generator, we only usually use it for training. The validation generator should not use the augmentation features because when we validate our model, we want to estimate how well it is doing on unseen, actual data, and not augmented data.

This is different from rule-based augmentation, where we try to create images that are easier to classify. For this reason, we need to create two ImageDataGenerator instances, one for training and one for validation. This can be done by running the following code:

train_datagen = ImageDataGenerator(
  rescale = 1/255,
  rotation_range=90,
  width_shift_range=0.2,
  height_shift_range=0.2,
  shear_range=0.2,
  zoom_range=0.1,
  horizontal_flip=True,
  fill_mode='nearest')

This training data generator makes use of a few built-in augmentation techniques.

Note

Note: There are more commands available in Keras. For a full list, you should refer to the Keras documentation at https://keras.io/.

In the following list, we've highlighted several commonly used commands:

  • rescale scales the values in the image. We used it before and will also use it for validation.
  • rotation_range is a range (0 to 180 degrees) in which to randomly rotate the image.
  • width_shift_range and height_shift_range are ranges (relative to the image size, so here 20%) in which to randomly stretch images horizontally or vertically.
  • shear_range is a range (again, relative to the image) in which to randomly apply shear.
  • zoom_range is the range in which to randomly zoom into a picture.
  • horizontal_flip specifies whether to randomly flip the image.
  • fill_mode specifies how to fill empty spaces created by, for example, rotation.

We can check out what the generator does by running one image through it multiple times.

First, we need to import the Keras image tools and specify an image path (this one was chosen at random). This can be done by running the following:

from keras.preprocessing import image
fname = 'train/Charlock/270209308.png'

We then need to load the image and convert it to a NumPy array, which is achieved with the following code:

img = image.load_img(fname, target_size=(150, 150))
img = image.img_to_array(img)

As before, we have to add a batch size dimension to the image:

img = np.expand_dims(img,axis=0)

We then use the ImageDataGenerator instance we just created, but instead of using flow_from_directory, we'll use flow, which allows us to pass the data directly into the generator. We then pass that one image we want to use, which we can do by running this:

gen = train_datagen.flow(img, batch_size=1)

In a loop, we then call next on our generator four times:

for i in range(4):
    plt.figure(i)
    batch = next(gen)
    imgplot = plt.imshow(image.array_to_img(batch[0]))
    
plt.show()

This will produce the following output:

Augmentation with ImageDataGenerator

A few samples of the randomly modified image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset