Chapter 7. TensorFlow Abstractions and Simplifications

The aim of this chapter is to get you familiarized with important practical extensions to TensorFlow. We start by describing what abstractions are and why they are useful to us, followed by a brief review of some of the popular TensorFlow abstraction libraries. We then go into two of these libraries in more depth, demonstrating some of their core functionalities along with some examples.

Chapter Overview

As most readers probably know, the term abstraction in the context of programming refers to a layer of code “on top” of existing code that performs purpose-driven generalizations of the original code. Abstractions are formed by grouping and wrapping pieces of code that are related to some higher-order functionality in a way that conveniently reframes them together. The result is simplified code that is easier to write, read, and debug, and generally easier and faster to work with. In many cases TensorFlow abstractions not only make the code cleaner, but can also drastically reduce code length and as a result significantly cut development time.

To get us going, let’s illustrate this basic notion in the context of TensorFlow, and take another look at some code for building a CNN like we did in Chapter 4:

def weight_variable(shape): 
   initial = tf.truncated_normal(shape, stddev=0.1) 
   return tf.Variable(initial)

def bias_variable(shape):
   initial = tf.constant(0.1, shape=shape)
   return tf.Variable(initial) 

def conv2d(x, W):
   return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], 
                       padding='SAME')

def conv_layer(input, shape): 
   W = weight_variable(shape) 
   b = bias_variable([shape[3]]) 
   h = tf.nn.relu(conv2d(input, W) + b) 
   hp = max_pool_2x2(h) 
   return hp


def max_pool_2x2(x): 
   return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
                         strides=[1, 2, 2, 1], padding='SAME')

x = tf.placeholder(tf.float32, shape=[None, 784])
x_image = tf.reshape(x, [-1, 28, 28, 1])

h1 = conv_layer(x_image, shape=[5, 5, 1, 32]) 
h2 = conv_layer(h1, shape=[5, 5, 32, 64])
h3 = conv_layer(h2, shape=[5, 5, 64, 32])

In native TensorFlow, in order to create a convolutional layer, we have to define and initialize its weights and biases according to the shapes of the input and the desired output, apply the convolution operation with defined strides and padding, and finally add the activation function operation. It’s easy to either accidentally forget one of these fundamental components or get it wrong. Also, repeating this process multiple times can be somewhat laborious and feels as if it could be done more efficiently.

In the preceding code example we created our own little abstraction by using functions that eliminate some of the redundancies in this process. Let’s compare the readability of that code with another version of it that does exactly the same, but without using any of the functions:

x = tf.placeholder(tf.float32, shape=[None, 784])
x_image = tf.reshape(x, [-1, 28, 28, 1])

W1 = tf.truncated_normal([5, 5, 1, 32], stddev=0.1)
b1 = tf.constant(0.1, shape=[32])
h1 = tf.nn.relu(tf.nn.conv2d(x_image, W1, 
                strides=[1, 1, 1, 1], padding='SAME') + b1)
hp1 = tf.nn.max_pool(h1, ksize=[1, 2, 2, 1], 
                     strides=[1, 2, 2, 1], padding='SAME')
W2 = tf.truncated_normal([5, 5, 32, 64], stddev=0.1)
b2 = tf.constant(0.1, shape=[64])
h2 = tf.nn.relu(tf.nn.conv2d(hp1, W2, 
                strides=[1, 1, 1, 1],  padding='SAME') + b2)
hp2 = tf.nn.max_pool(h2, ksize=[1, 2, 2, 1], 
                     strides=[1, 2, 2, 1], padding='SAME')
W3 = tf.truncated_normal([5, 5, 64, 32], stddev=0.1)
b3 = tf.constant(0.1, shape=[32])
h3 = h1 = tf.nn.relu(tf.nn.conv2d(hp2, W3, 
                     strides=[1, 1, 1, 1], padding='SAME') + b3)
hp3 = tf.nn.max_pool(h3, ksize=[1, 2, 2, 1], 
                     strides=[1, 2, 2, 1], padding='SAME')

Even with just three layers, the resulting code looks pretty messy and confusing. Clearly, as we progress to larger and more advanced networks, code such as this would be hard to manage and pass around.

Beyond the more typical medium-sized batching of code, long and complex code is often “wrapped up” for us in abstraction libraries. This is particularly effective in relatively simple models where very little customization is ever required. As a preview to what will follow in the next section, you can already see how in contrib.learn, one of the abstractions available for TensorFlow, the core of defining and training a linear regression model similar to the one at the end of Chapter 3 could be done in just two lines:

regressor = learn.LinearRegressor(feature_columns=feature_columns,
                                  optimizer=optimizer)
regressor.fit(X, Y, steps=200, batch_size=506)

High-Level Survey

More than a few great TensorFlow open source extensions are available at the time of writing this book. Among the popular ones are:

  • tf.contrib.learn
  • TFLearn
  • TF-Slim
  • Keras

While TFLearn needs to be installed, contrib.learn and TF-Slim (now tf.contrib.slim) are merged with TensorFlow and therefore require no installation. In 2017 Keras gained official Google support, and it has also been moved into tf.contrib as of version 1.1 (tf.contrib.keras). The name contrib refers to the fact that code in this library is “contributed” and still requires testing to see if it receives broad acceptance. Therefore, it could still change, and is yet to be part of the core TensorFlow.

contrib.learn started as an independent simplified interface for TensorFlow and was initially called Scikit Flow, with the intention of making the creation of complex networks with TensorFlow more accessible for those who are transitioning from the scikit-learn world of “one-liner” machine learning. As is often the case, it was later merged to TensorFlow and is now regarded as its Learn module, with extensive documentation and examples that are available on the official TensorFlow website.

Like other libraries, the main goal of contrib.learn is to make it easy to configure, train, and evaluate our learning models. For very simple models, you can use out-of-the-box implementations to train with just a few lines of code. Another great advantage of contrib.learn, as we will see shortly, is functionality with which data features can be handled very conveniently.

While contrib.learn is more transparent and low-level, the other three extensions are a bit cleaner and more abstract, and each has its own specialties and little advantages that might come in handy depending on the needs of the user.

TFLearn and Keras are full of functionality and have many of the elements needed for various types of state-of-the-art modeling. Unlike all the other libraries, which were created to communicate solely with TensorFlow, Keras supports both TensorFlow and Theano (a popular library for deep learning).

TF-Slim was created mainly for designing complex convolutional nets with ease and has a wide variety of pretrained models available, relieving us from the expensive process of having to train them ourselves.

These libraries are very dynamic and are constantly changing, with the developers adding new models and functionalities, and occasionally modifying their syntax.

Theano

Theano is a Python library that allows you to manipulate symbolic mathematical expressions that involve tensor arrays in an efficient way, and as such it can serve as a deep learning framework, competing with TensorFlow. Theano has been around longer, and therefore is a bit more mature than TensorFlow, which is still changing and evolving but is rapidly becoming the leader of the pack (it is widely considered by many to already be the leading library, with many advantages over other frameworks).

In the following sections we demonstrate how to use these extensions, alongside some examples. We begin by focusing on contrib.learn, demonstrating how easily it lets us train and run simple regression and classification models. Next we introduce TFLearn and revisit the more advanced models introduced in the previous chapters—CNN and RNN. We then give a short introduction to autoencoders and demonstrate how to create one with Keras. Finally, we close this chapter with brief coverage of TF-Slim and show how to classify images using a loaded pretrained state-of-the-art CNN model.

contrib.learn

Using contrib.learn doesn’t require any installation since it’s been merged with TensorFlow:

import tensorflow as tf
from tensorflow.contrib import learn

We start with contrib.learn’s out-of-the-box estimators (a fancy name for models), which we can train in a quick and efficient manner. These predefined estimators include simple linear and logistic regression models, a simple linear classifier, and a basic deep neural network. Table 7-1 lists some of the popular estimators we can use.

Table 7-1. Popular built-in contrib.learn estimators
Estimator Description
LinearRegressor() Linear regression model to predict label value given observation of feature values.
LogisticRegressor()

Logistic regression estimator for binary classification.

LinearClassifier() Linear model to classify instances into one of multiple possible classes. When the number of possible classes is 2, this is binary classification.
DNNRegressor() A regressor for TensorFlow deep neural network (DNN) models.
DNNClassifier()

A classifier for TensorFlow DNN models.

Of course, we would also like to use more-advanced and customized models, and for that contrib.learn lets us conveniently wrap our own homemade estimators, a feature that will be covered as we go along. Once we have an estimator ready for deployment, whether it was made for us or we made it ourselves, the steps are pretty much the same:

  1. We instantiate the estimator class to create our model:

    model = learn.<some_Estimator>()
    
  2. Then we fit it using our training data:

    model.fit()
    
  3. We evaluate the model to see how well it does on some given dataset:

    model.evaluate()
    
  4. Finally, we use our fitted model to predict outcomes, usually for new data:

    model.predict()
    

These four fundamental stages are also found in other extensions.

contrib offers many other functionalities and features; in particular, contrib.learn has a very neat way to treat our input data, which will be the focus of the next subsection, where we discuss linear models.

Linear Regression

We start our contrib.learn engagement with one of its strongest features: linear models. We say that a model is linear whenever it is defined by a function of a weighted sum of the features, or more formally f(w1x1 + w2x2 +...+ wnxn), where f could be any sort of function, like the identity function (as in linear regression) or a logistic function (as in logistic regression). Although limited in their expressive power, linear models have lots of advantages, such as clear interpretability, optimization speed, and simplicity.

In Chapter 3 we created our own linear regression model using native TensorFlow by first creating a graph with placeholders for the input and target data, Variables for the set of parameters, a loss function, and an optimizer. After the model was defined, we ran the session and obtained results.

In the following section we first repeat this full process, and then show how drastically easier it is to do with contrib.learn. For this example we use the Boston Housing dataset, available to download using the sklearn library. The Boston Housing dataset is a relatively small dataset (506 samples), containing information concerning housing in the area of Boston, Massachusetts. There are 13 predictors in this dataset:

  1. CRIM: per capita crime rate by town
  2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS: proportion of nonretail business acres per town
  4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX: nitric oxide concentration (parts per 10 million)
  6. RM: average number of rooms per dwelling
  7. AGE: proportion of owner-occupied units built prior to 1940
  8. DIS: weighted distances to five Boston employment centers
  9. RAD: index of accessibility to radial highways
  10. TAX: full-value property tax rate per $10,000
  11. PTRATIO: pupil–teacher ratio by town
  12. B: 1000(Bk – 0.63)^2, where Bk is the proportion of blacks by town
  13. LSTAT: % lower status of the population

The target variable is the median value of owner-occupied homes in thousands of dollars. In this example we try to predict the target variable by using some linear combination of these 13 features.

First, we import the data:

from sklearn import datasets, metrics, preprocessing
boston = datasets.load_boston()
x_data = preprocessing.StandardScaler().fit_transform(boston.data)
y_data = boston.target

Next, we use the same linear regression model as in Chapter 3. This time we track the “loss” so we can measure the mean squared error (MSE), which is the average of the squared differences between the real target value and our predicted value. We use this measure as an indicator of how well our model performs:

x = tf.placeholder(tf.float64,shape=(None,13))
y_true = tf.placeholder(tf.float64,shape=(None))

with tf.name_scope('inference') as scope:
    w = tf.Variable(tf.zeros([1,13],dtype=tf.float64,name='weights'))
    b = tf.Variable(0,dtype=tf.float64,name='bias')
    y_pred = tf.matmul(w,tf.transpose(x)) + b

with tf.name_scope('loss') as scope:
    loss = tf.reduce_mean(tf.square(y_true-y_pred))

with tf.name_scope('train') as scope:
    learning_rate = 0.1
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train = optimizer.minimize(loss)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)      
    for step in range(200):
        sess.run(train,{x: x_data, y_true: y_data})
        
    MSE = sess.run(loss,{x: x_data, y_true: y_data})
print(MSE)

Out: 
MSE = 21.9036388397

After 200 iterations, we print out the MSE calculated for the training set. Now we perform the exact same process, but using contrib.learn’s estimator for linear regression. The whole process of defining, fitting, and evaluating the model comes down to just a few lines:

  1. The linear regression model is instantiated using learn.LinearRegressor() and fed with knowledge about the data representation and the type of optimizer:

    reg = learn.LinearRegressor(
          feature_columns=feature_columns,
          optimizer=tf.train.GradientDescentOptimizer(
          learning_rate=0.1)
          )
    
  2. The regressor object is trained using .fit(). We pass the covariates and the target variable, and set the number of steps and batch size:

    reg.fit(x_data, boston.target, steps=NUM_STEPS, 
            batch_size=MINIBATCH_SIZE)
  3. The MSE loss is returned by .evaluate():

    MSE = regressor.evaluate(x_data, boston.target, steps=1)
    

Here’s the code in its entirety:

NUM_STEPS = 200
MINIBATCH_SIZE = 506

feature_columns = learn.infer_real_valued_columns_from_input(x_data)

reg = learn.LinearRegressor(
      feature_columns=feature_columns,
      optimizer=tf.train.GradientDescentOptimizer(
      learning_rate=0.1)
      )

reg.fit(x_data, boston.target, steps=NUM_STEPS, 
        batch_size=MINIBATCH_SIZE)

MSE = reg.evaluate(x_data, boston.target, steps=1)

print(MSE)

Out: 
{'loss': 21.902138, 'global_step': 200}

Some representation of the input data is passed in the regressor instantiation as a processed variable called feature_columns. We will return to this shortly.

DNN Classifier

As with regression, we can use contrib.learn to apply an out-of-the-box classifier. In Chapter 2 we created a simple softmax classifier for the MNIST data. The DNNClassifier estimator allows us to perform a similar task with a considerably reduced amount of code. Also, it lets us add hidden layers (the “deep” part of the DNN).

As in Chapter 2, we first import the MNIST data:

import sys
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data
DATA_DIR = '/tmp/data' if not 'win32' in sys.platform else "c:\tmp\data"
data = input_data.read_data_sets(DATA_DIR, one_hot=False)
x_data, y_data = data.train.images,data.train.labels.astype(np.int32)
x_test, y_test = data.test.images,data.test.labels.astype(np.int32)

Note that in this case, due to the requirement of the estimator, we pass the target in its class label form:

one_hot=False

returning a single integer per sample, corresponding to the correct digit class (i.e., values from 0 to [number of classes] – 1), instead of the one-hot form where each label is a vector with 1 in the index that corresponds to the correct class.

The next steps are similar to the ones we took in the previous example, except that when we define the model, we add the number of classes (10 digits) and pass a list where each element corresponds to a hidden layer with the specified number of units. In this example we use one hidden layer with 200 units:

NUM_STEPS = 2000
MINIBATCH_SIZE = 128

feature_columns = learn.infer_real_valued_columns_from_input(x_data)

dnn = learn.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[200],
    n_classes=10,
    optimizer=tf.train.ProximalAdagradOptimizer(
    learning_rate=0.2)
    )

dnn.fit(x=x_data,y=y_data, steps=NUM_STEPS,
        batch_size=MINIBATCH_SIZE)

test_acc = dnn.evaluate(x=x_test,y=y_test, steps=1)["accuracy"]
print('test accuracy: {}'.format(test_acc))

Out:
test accuracy: 0.977

Though not as good as our CNN model in Chapter 4 (above 99%), the test accuracy here (around 98%) is significantly better than it was in the simple softmax example (around 92%) as a result of adding just a single layer. In Figure 7-1 we see how the accuracy of the model increases with the number of units in that hidden layer.

Figure 7-1. MNIST classification test accuracy as a function of units added in a single hidden layer.

Using the <Estimator>.predict() method, we can predict the classes of new samples. Here we will use the predictions to demonstrate how we can analyze our model’s performance—what classes were best identified and what types of typical errors were made. Plotting a confusion matrix can help us understand these behaviors. We import the code to create the confusion matrix from the scikit-learn library:

from sklearn.metrics import confusion_matrix

y_pred = dnn.predict(x=x_test,as_iterable=False)
class_names = ['0','1','2','3','4','5','6','7','8','9']    
cnf_matrix = confusion_matrix(y_test, y_pred)

The confusion matrix is shown in Figure 7-2. Its rows correspond to the true digits, its columns to the predicted digits. We see, for example, that the model sometimes misclassified 5 as 3 and 9 as 4 and 7.

Figure 7-2. A confusion matrix showing the number of predicted digits (columns) for each true label (rows).

FeatureColumn

One of contrib.learn’s nicest offerings is handling features of different types, which can sometimes be a little tricky. To make things easier, contrib.learn offers us the FeatureColumn abstraction.

With a FeatureColumn we can maintain a representation of a single feature in our data, while performing a range of transformations defined over it. A FeatureColumn can be either one of the original columns or any new columns that may be added depending on our transformations. These may include creating a suitable and effective representation for categorical data by encoding it as a sparse vector (often referred to as dummy encoding), creating feature crosses to look for feature interactions, and bucketization (discretization of the data). All this can be done while manipulating the feature as a single semantic unit (encompassing, for example, all dummy vectors).

We use the FeatureColumn abstraction to specify the form and structure of each feature of our input data. For instance, let’s say that our target variable is height, and we try to predict it using two features, weight and species. We make our own synthetic data where heights are generated by dividing each weight by a factor of 100 and adding a constant that varies according to the species: 1 is added for Humans, 0.9 for Goblins, and 1.1 for ManBears. We then add normally distributed noise to each instance:

import pandas as pd
N = 10000

weight = np.random.randn(N)*5+70
spec_id = np.random.randint(0,3,N)
bias = [0.9,1,1.1]
height = np.array([weight[i]/100 + bias[b] for i,b in enumerate(spec_id)])
spec_name = ['Goblin','Human','ManBears']
spec = [spec_name[s] for s in spec_id]

Figure 7-3 shows visualizations of the data samples.

Figure 7-3. Left: A histogram of heights for the three types of species: Goblins, Humans, and ManBears (distributions centered at 1.6, 1.7, and 1.8, respectively). Right: A scatter plot of heights vs. weights.

Our target variable is a numeric NumPy array of heights height, and our covariates are the numeric NumPy array of weights weight and a list of strings denoting the name of each species spec.

We use the Pandas library to have the data represented as a data frame (table), so that we can conveniently access each of its columns:

df = pd.DataFrame({'Species':spec,'Weight':weight,'Height':height})

Figure 7-4 shows what our data frame looks like.

Figure 7-4. Ten rows of the Height–Species–Weight data frame. Heights and Weights are numeric; Species is categorical with three categories.

Pandas

Pandas is a very popular and useful library in Python for working with relational or labeled data like tabular data, multidimensional time series, etc. For more information on how to use Pandas, we refer the reader to Wes McKinney’s book Python for Data Analysis (O’Reilly).

We start by specifying the nature of each feature. For Weight we use the following FeatureColumn command, indicating that it’s a continuous variable:

from tensorflow.contrib import layers
Weight = layers.real_valued_column("Weight")

Layers

contrib.layers is not a part of contrib.learn, but another independent subsection of the TensorFlow Python API that offers high-level operations and tools for building neural network layers.

The name that was passed to the function (in this case Weight) is crucially important since it will be used to associate the FeatureColumn representation with the actual data.

Species is a categorical variable, meaning its values have no natural ordering, and therefore cannot be represented as a single variable in the model. Instead, it has to be extended and encoded as several variables, depending on the number of categories. FeatureColumn does this for us, so we just have to use the following command to specify that it is a categorical feature and indicate the name of each category:

Species = layers.sparse_column_with_keys(
    column_name="Species", keys=['Goblin','Human','ManBears'])

Next, we instantiate an estimator class and input a list of our FeatureColumns:

reg = learn.LinearRegressor(feature_columns=[Weight,Species])

Up to now we’ve defined how the data will be represented in the model; in the following stage of fitting the model we need to provide the actual training data. In the Boston Housing example, the features were all numeric, and as a result we could just input them as x_data and target data.

Here, contrib.learn requires that we use an additional encapsulating input function. The function gets both predictors and target data in their native form (Pandas data frame, NumPy array, list, etc.) as input, and returns a dictionary of tensors. In these dictionaries, each key is a name of a FeatureColumn (the names Weight and Species that were given as input previously), and its value needs to be a Tensor that contains the corresponding data. This means that we also have to transform the values into a TensorFlow Tensor inside the function.

In our current example, the function receives our data frame, creates a dictionary feature_cols, and then stores the values of each column in the data frame as a Tensor for the corresponding key. It then returns that dictionary and the target variable as a Tensor. The keys have to match the names we used to define our FeatureColumns:

def input_fn(df):
    feature_cols = {}
    feature_cols['Weight'] = tf.constant(df['Weight'].values)
    
    feature_cols['Species'] =  tf.SparseTensor(
    indices=[[i, 0] for i in range(df['Species'].size)],
    values=df['Species'].values,
    dense_shape=[df['Species'].size, 1])
                    
    labels = tf.constant(df['Height'].values)

    return feature_cols, labels

The values of Species are required by their FeatureColumn specification to be encoded in a sparse format. For that we use tf.SparseTensor(), where each i index corresponds to a nonzero value (in this case, all the rows in a one-column matrix).

For example, the following:

SparseTensor(indices=[[0, 0], [2, 1], [2, 2]], values=[2, 5, 7],
             dense_shape=[3, 3])

represents the dense tensor:

[[2, 0, 0]
 [0, 0, 0]
 [0, 5, 7]]

We pass it to the .fit() method in the following way:

reg.fit(input_fn=lambda:input_fn(df), steps=50000)

Here, input_fn() is the function we just created, df is the data frame containing the data, and we also specify the number of iterations.

Note that we pass the function in a form of a lambda function rather than the function’s outputs, because the .fit() method requires a function object. Using lambda allows us to pass our input arguments and keep it in an object form. There are other workarounds we could use to achieve the same outcome, but lambda does the trick.

The fitting process may take a while. If you don’t want to do it all at once, you can split it into segments (see the following note).

Splitting the training process

It’s possible to perform the fit iteratively since the state of the model is preserved in the classifier. For example, instead of performing all 50,000 iterations consecutively like we did, we could split it into five segments:

reg.fit(input_fn=lambda:input_fn(df), steps=10000)
reg.fit(input_fn=lambda:input_fn(df), steps=10000)
reg.fit(input_fn=lambda:input_fn(df), steps=10000)
reg.fit(input_fn=lambda:input_fn(df), steps=10000)
reg.fit(input_fn=lambda:input_fn(df), steps=10000)

and achieve the same outcome. This could be useful if we want to have some tracking of the model while training it; however, there are better ways to do that, as we will see later on.

Now let’s see how well the model does by looking at the estimated weights. We can use the the .get_variable_value() method to get the variables’ values:

w_w = reg.get_variable_value('linear/Weight/weight')
print('Estimation for Weight: {}'.format(w_w))

s_w = reg.get_variable_value('linear/Species/weights')
b = reg.get_variable_value('linear/bias_weight')
print('Estimation for Species: {}'.format(s_w + b))

Out: 
        Estimation for Weight:  [[0.00992305]]
        Estimation for Species: [[0.90493023]
                                 [1.00566959]
                                 [1.10534406]]

We request the values of the weights for both Weight and Species. Species is a categorical variable, so its three weights serve as different bias terms. We see that the model did quite well in estimating the true weights (0.01 for Weight and 0.9, 1, 1.1 for Goblins, Humans, and ManBears, respectively, for Species). We can get the names of the variables by using the .get_variable_names() method.

The same process can be used in more complicated scenarios where we want to handle many types of features and their interactions. Table 7-2 lists some useful operations you can do with contrib.learn.

Table 7-2. Useful feature transformation operations
Operation Description

layers.sparse_column_with_keys()

Handles the conversion of categorical values 

layers.sparse_column_with_hash_bucket()

Handles the conversion of categorical features for which you don’t know all possible values
layers.crossed_column() Sets up feature crosses (interactions)
layers.bucketized_column() Turns a continuous column into a categorical column

Homemade CNN with contrib.learn

We next move on to creating our own estimator by using contrib.learn. To do so, we first need to construct a model function where our homemade network will reside and an object containing our training settings.

In the following example we create a custom CNN estimator that is identical to the one used at the beginning of Chapter 4, and use it again to classify the MNIST data. We begin by creating a function for our estimator with inputs that include our data, the mode of operation (training or test), and the parameters of the model.

In the MNIST data the pixels are concatenated in the form of a vector and therefore require that we reshape them:

x_image = tf.reshape(x_data, [-1, 28, 28, 1])

We build the network by using the contrib.layers functionality, making the process of layer construction simpler.

Using layers.convolution2d() we can set everything in a one-liner command: we pass the input (the output of the previous layer), and then indicate the number of feature maps (32), the size of the filter (5×5), and the activation function (relu), and initialize the weights and biases. The dimensionality of the input is automatically identified and does not need to be specified. Also, unlike when working in lower-level TensorFlow, we don’t need to separately define the shapes of the variables and biases:

conv1 = layers.convolution2d(x_image, 32, [5,5],
            activation_fn=tf.nn.relu,
            biases_initializer=tf.constant_initializer(0.1),
            weights_initializer=tf.truncated_normal_initializer(stddev=0.1))

The padding is set to 'SAME' by default (unchanged number of pixels), resulting in an output of shape 28×28×32.

We also add the standard 2×2 pooling layer:

pool1 = layers.max_pool2d(conv1, [2,2])

We then repeat these steps, this time for 64 target feature maps:

conv2 = layers.convolution2d(pool1, 64, [5,5],
             activation_fn=tf.nn.relu,
             biases_initializer=tf.constant_initializer(0.1),
             weights_initializer=tf.truncated_normal_initializer(stddev=0.1)) 

pool2 = layers.max_pool2d(conv2, [2,2])

Next, we flatten the 7×7×64 tensor and add a fully connected layer, reducing it to 1,024 entries. We use fully_connected() similarly to convolution2d(), except we specify the number of output units instead of the size of the filter (there’s just one of those):

pool2_flat = tf.reshape(pool2, [-1, 7*7*64]) 
fc1 = layers.fully_connected(pool2_flat, 1024,
          activation_fn=tf.nn.relu,
          biases_initializer=tf.constant_initializer(0.1),
          weights_initializer=tf.truncated_normal_initializer(stddev=0.1))

We then add dropout with keep_prob as set in the parameters given to the function (train/test mode), and the final fully connected layer with 10 output entries, corresponding to the 10 classes:

fc1_drop = layers.dropout(fc1, keep_prob=params["dropout"], 
                            is_training=(mode == 'train')) 
y_conv = layers.fully_connected(fc1_drop, 10, activation_fn=None)

We complete our model function by defining a training object with the loss and the learning rate of the optimizer.

We now have one function that encapsulates the entire model:

def model_fn(x, target, mode, params):
    y_ = tf.cast(target, tf.float32)
    x_image = tf.reshape(x, [-1, 28, 28, 1])

    # Conv layer 1
    conv1 = layers.convolution2d(x_image, 32, [5,5],
                activation_fn=tf.nn.relu,
                biases_initializer=tf.constant_initializer(0.1),
                weights_initializer=tf.truncated_normal_initializer(stddev=0.1))
    pool1 = layers.max_pool2d(conv1, [2,2])

    # Conv layer 2
    conv2 = layers.convolution2d(pool1, 64, [5,5],
                activation_fn=tf.nn.relu,
                biases_initializer=tf.constant_initializer(0.1),
                weights_initializer=tf.truncated_normal_initializer(stddev=0.1))
    pool2 = layers.max_pool2d(conv2, [2,2])

    # FC layer
    pool2_flat = tf.reshape(pool2, [-1, 7*7*64])
    fc1 = layers.fully_connected(pool2_flat, 1024,
              activation_fn=tf.nn.relu,
              biases_initializer=tf.constant_initializer(0.1),
              weights_initializer=tf.truncated_normal_initializer(stddev=0.1))
    fc1_drop = layers.dropout(fc1, keep_prob=params["dropout"],
        is_training=(mode == 'train'))

    # Readout layer
    y_conv = layers.fully_connected(fc1_drop, 10, activation_fn=None)

    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(logits=y_conv, labels=y_))
    train_op = tf.contrib.layers.optimize_loss(
        loss=cross_entropy,
        global_step=tf.contrib.framework.get_global_step(),
        learning_rate=params["learning_rate"],
        optimizer="Adam")

    predictions = tf.argmax(y_conv, 1)
    return predictions, cross_entropy, train_op

We instantiate the estimator by using contrib.learn.Estimator(), and we’re good to go. Once defined, we can use it with the same functionalities as before:

from tensorflow.contrib import layers

data = input_data.read_data_sets(DATA_DIR, one_hot=True)
x_data, y_data = data.train.images,np.int32(data.train.labels)
tf.cast(x_data,tf.float32)
tf.cast(y_data,tf.float32)

model_params = {"learning_rate": 1e-4, "dropout": 0.5}

CNN = tf.contrib.learn.Estimator(
    model_fn=model_fn, params=model_params)

print("Starting training for %s steps max" % 5000)
CNN.fit(x=data.train.images,
        y=data.train.labels, batch_size=50,
        max_steps=5000)

test_acc = 0
for ii in range(5):
    batch = data.test.next_batch(2000)
    predictions = list(CNN.predict(batch[0], as_iterable=True))
    test_acc = test_acc + (np.argmax(batch[1],1) == predictions).mean()

print(test_acc/5)

Out: 
0.9872

Using contrib.learn and contrib.layers, the number of lines of code was cut down considerably in comparison to lower-level TensorFlow. More important, the code is much more organized and easier to follow, debug, and write.

With this example we conclude the contrib.learn portion of this chapter. We’ll now move on to cover some of the functionalities of the TFLearn library.

TFLearn

TFLearn is another library that allows us to create complex custom models in a very clean and compressed way, while still having a reasonable amount of flexibility, as we will see shortly.

Installation

Unlike the previous library, TFLearn first needs to be installed. The installation is straightforward using pip:

pip install tflearn

If that doesn’t work, it can be downloaded from GitHub and installed manually.

After the library has been successfully installed, you should be able to import it:

import tflearn

CNN

Many of the functionalities of TFLearn resemble those covered in the previous section on contrib.learn; however, creating a custom model is a bit simpler and cleaner in comparison. In the following code we use the same CNN used earlier for the MNIST data.

Model construction is wrapped and finalized using regression(), where we set the loss and optimization configuration as we did previously for the training object in contrib.learn (here we simply specify 'categorical_crossentropy' for the loss, rather than explicitly defining it):

from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression

# Data loading and basic transformations
import tflearn.datasets.mnist as mnist
X, Y, X_test, Y_test = mnist.load_data(one_hot=True)
X = X.reshape([-1, 28, 28, 1])
X_test = X_test.reshape([-1, 28, 28, 1])

# Building the network
CNN = input_data(shape=[None, 28, 28, 1], name='input')
CNN = conv_2d(CNN, 32, 5, activation='relu', regularizer="L2")
CNN = max_pool_2d(CNN, 2)
CNN = local_response_normalization(CNN)
CNN = conv_2d(CNN, 64, 5, activation='relu', regularizer="L2")
CNN = max_pool_2d(CNN, 2)
CNN = local_response_normalization(CNN)
CNN = fully_connected(CNN, 1024, activation=None)
CNN = dropout(CNN, 0.5)
CNN = fully_connected(CNN, 10, activation='softmax')
CNN = regression(CNN, optimizer='adam', learning_rate=0.0001,
                     loss='categorical_crossentropy', name='target')

# Training the network
model = tflearn.DNN(CNN,tensorboard_verbose=0,
                    tensorboard_dir = 'MNIST_tflearn_board/',
                    checkpoint_path = 'MNIST_tflearn_checkpoints/checkpoint')
model.fit({'input': X}, {'target': Y}, n_epoch=3, 
           validation_set=({'input': X_test}, {'target': Y_test}),
           snapshot_step=1000,show_metric=True, run_id='convnet_mnist') 

Another layer that’s been added here, and that we briefly mentioned in Chapter 4, is the local response normalization layer. See the upcoming note for more details about this layer.

The tflearn.DNN() function is somewhat equivalent to contrib.learn.Estimator()—it’s the DNN model wrapper with which we instantiate the model and to which we pass our constructed network.

Here we can also set the TensorBoard and checkpoints directories, the level of verbosity of TensorBoard’s logs (0–3, from basic loss and accuracy reports to other measures like gradients and weights), and other settings.

Once we have a model instance ready, we can then perform standard operations with it. Table 7-3 summarizes the model’s functionalities in TFLearn.

Table 7-3. Standard TFLearn operations
Function Description

evaluate(X, Y, batch_size=128)

Perform evaluations of the model on given samples.

fit(X, Y, n_epoch=10)

Train the model with input features X and target Y to the network.

get_weights(weight_tensor)

Get a variable’s weights.

load(model_file)

Restore model weights.

predict(X)

Get model predictions for the given input data.

save(model_file)

Save model weights.

set_weights(tensor, weights)

Assign a tensor variable a given value.

Similarly with contrib.learn, the fitting operation is performed by using the .fit() method, to which we feed the data and control training settings: the number of epochs, training and validation batch sizes, displayed measures, saved summaries frequency, and more. During fitting, TFLearn displays a nice dashboard, enabling us to track the training process online.

Local response normalization

The local response normalization (LRN) layer performs a kind of lateral inhibition by normalizing over local input regions. This is done by dividing the input values by the weighted, squared sum of all inputs within some depth radius, which we can manually choose. The resulting effect is that the activation contrast between the excited neurons and their local surroundings increases, producing more salient local maxima. This method encourages inhibition since it will diminish activations that are large, but uniform. Also, normalization is useful to prevent neurons from saturating when inputs may have varying scale (ReLU neurons have unbounded activation). There are more modern alternatives for regularization, such as batch normalization and dropout, but it is good to know about LRN too.

After fitting the model, we evaluate performance on the test data:

evaluation = model.evaluate({'input': X_test},{'target': Y_test})
print(evaluation):

Out: 
0.9862

and form new predictions (using them here again as a “sanity check” to the previous evaluation):

pred = model.predict({'input': X_test})
print((np.argmax(Y_test,1)==np.argmax(pred,1)).mean())

Out: 
0.9862

Iterations training steps and epochs in TFLearn

In TFLearn, each iteration is a full pass (forward and backward) over one example. The training step is the number of full passes to perform, determined by the batch size you set (the default is 64), and an epoch is a full pass over all the training examples (50,000 in the case of MNIST). Figure 7-5 shows an example of the interactive display in TFLearn.

Figure 7-5. Interactive display in TFLearn.

RNN

We wrap up our introduction to TFLearn by constructing a fully functioning text classification RNN model that considerably simplifies the code we saw in Chapters 5 and 6.

The task we perform is a sentiment analysis for movie reviews with binary classification (good or bad). We will use a well-known dataset of IMDb reviews, containing 25,000 training samples and 25,000 test samples:

from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb

# IMDb dataset loading
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
X_train, Y_train = train
X_test, Y_test = test

We first prepare the data, which has different sequence lengths, by equalizing the sequences with zero-padding by using tflearn.data_utils.pad_sequences() and setting 100 as the maximum sequence length:

X_train = pad_sequences(X_train, maxlen=100, value=0.)
X_test = pad_sequences(X_test, maxlen=100, value=0.)

Now we can represent data in one tensor, with samples in its rows and word IDs in its columns. As was explained in Chapter 5, IDs here are integers that are used to encode the actual words arbitrarily. In our case, we have 10,000 unique IDs.

Next, we embed each word into a continuous vector space by using tflearn.embedding(), transforming our two-dimensional tensor [samples, IDs] into a three-dimensional tensor, [samples, IDs, embedding-size], where each word ID now corresponds to a vector of size of 128. Before that we use input_data() to input/feed data to the network (a TensorFlow placeholder is created with the given shape):

RNN = tflearn.input_data([None, 100])
RNN = tflearn.embedding(RNN, input_dim=10000, output_dim=128)

Finally, we add an LSTM layer and a fully connected layer to output the binary outcome:

RNN = tflearn.lstm(RNN, 128, dropout=0.8)
RNN = tflearn.fully_connected(RNN, 2, activation='softmax'

Here’s the full code:

from tflearn.data_utils import to_categorical, pad_sequences
from tflearn.datasets import imdb

# Load data
train, test, _ = imdb.load_data(path='imdb.pkl', n_words=10000,
                                valid_portion=0.1)
X_train, Y_train = train
X_test, Y_test = test

# Sequence padding and converting labels to binary vectors
X_train = pad_sequences(X_train, maxlen=100, value=0.)
X_test = pad_sequences(X_test, maxlen=100, value=0.)
Y_train = to_categorical(Y_train, nb_classes=2)
Y_test = to_categorical(Y_test, nb_classes=2)

# Building an LSTM network
RNN = tflearn.input_data([None, 100])
RNN = tflearn.embedding(RNN, input_dim=10000, output_dim=128)

RNN = tflearn.lstm(RNN, 128, dropout=0.8)
RNN = tflearn.fully_connected(RNN, 2, activation='softmax')
RNN = tflearn.regression(RNN, optimizer='adam', learning_rate=0.001,
                         loss='categorical_crossentropy')

# Training the network
model = tflearn.DNN(RNN, tensorboard_verbose=0)
model.fit(X_train, Y_train, validation_set=(X_test, Y_test),
                                show_metric=True, batch_size=32)

In this section, we had just a quick taste of TFLearn. The library has nice documentation and many examples that are well worth looking at.

Keras

Keras is one of the most popular and powerful TensorFlow extension libraries. Among the extensions we survey in this chapter, Keras is the only one that supports both Theano—upon which it was originally built—and TensorFlow. This is possible because of Keras’s complete abstraction of its backend; Keras has its own graph data structure for handling computational graphs and communicating with TensorFlow.

In fact, because of that it could even be possible to define a Keras model with either TensorFlow or Theano and then switch to the other.

Keras has two main types of models to work with: sequential and functional. The sequential type is designed for simple architectures, where we just want to stack layers in a linear fashion. The functional API can support more-general models with a diverse layer structure, such as multioutput models.

We will take a quick look at the syntax used for each type of model.

Installation

In TensorFlow 1.1+ Keras can be imported from the contrib library; however, for older versions it needs to be installed externally. Note that Keras requires the numpy, scipy, and yaml dependencies. Similarly to TFLearn, Keras can either be installed using pip:

pip install keras

or downloaded from GitHub and installed using:

python setup.py install

By default, Keras will use TensorFlow as its tensor manipulation library. If it is set to use Theano, it can be switched by changing the settings in the file called $HOME/.keras/keras.json (for Linux users—modify the path according to your OS), where the attribute backend appears in addition to other technical settings not important in this chapter:

{
    "image_data_format": "channels_last",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

If we want to access the backend, we can easily do so by first importing it:

from keras import backend as K

We can then use it for most tensor operations as we would in TensorFlow (also for Theano). For example, this:

input = K.placeholder(shape=(10,32))

is equivalent to:

tf.placeholder(shape=(10,32))

Sequential model

Using the sequential type is very straightforward—we define it and can simply start adding layers:

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(units=64, input_dim=784))
model.add(Activation('softmax'))

Or equivalently:

model = Sequential([
    Dense(64, input_shape=(784,),activation='softmax')
])

A dense layer is a fully connected layer. The first argument denotes the number of output units, and the input shape is the shape of the input (in this example the weight matrix would be of size 784×64). Dense() also has an optional argument where we can specify and add an activation function, as in the second example.

After the model is defined, and just before training it, we set its learning configurations by using the .compile() method. It has three input arguments—the loss function, the optimizer, and another metric function that is used to judge the performance of your model (not used as the actual loss when training the model):

model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

We can set the optimizer at a finer resolution (learning rate, method, etc.) using .optimizers. For example:

optimizer=keras.optimizers.SGD(lr=0.02, momentum=0.8, nesterov=True))

Finally, we feed .fit() the data and set the number of epochs and batch size. As with the previous libraries, we can now easily evaluate how it does and perform predictions with new test data:

from keras.callbacks import TensorBoard, EarlyStopping, ReduceLROnPlateau

early_stop = EarlyStopping(monitor='val_loss', min_delta=0,
                           patience=10, verbose=0, mode='auto')

model.fit(x_train, y_train, epochs=10, batch_size=64,
          callbacks=[TensorBoard(log_dir='/models/autoencoder',)
          early_stop])

loss_and_metrics = model.evaluate(x_test, y_test, batch_size=64)
classes = model.predict(x_test, batch_size=64)

Note that a callbacks argument was added to the fit() method. Callbacks are functions that are applied during the training procedure, and we can use them to get a view on statistics and make dynamic training decisions by passing a list of them to the .fit() method.

In this example we plug in two callbacks: TensorBoard, specifying its output folder, and early stopping.

Early stopping

Early stopping is used to protect against overfitting by preventing the learner from further improving its fit to the training data at the expense of increasing the generalization error. In that sense, it can be thought of as a form of regularization. In Keras we can specify the minimum change to be monitored (min_delta), the number of no-improvement epochs to stop after (patience), and the direction of wanted change (mode).

Functional model

The main practical difference between the functional model and the sequential model is that here we first define our input and output, and only then instantiate the model.

We first create an input Tensor according to its shape:

inputs = Input(shape=(784,))

Then we define our model:

x = Dense(64, activation='relu')(inputs)
x = Dense(32, activation='relu')(x)
outputs = Dense(10, activation='softmax')(x)

As we can see, the layers act as functions, giving the functional model its name.

And now we instantiate the model, passing both inputs and outputs to Model:

model = Model(inputs=inputs, outputs=outputs)

The other steps follow as before:

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=64)
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=64)
classes = model.predict(x_test, batch_size=64)

We will end this section by introducing the concept of autoencoders and then showing how to implement one using Keras.

Autoencoders

Autoencoders are neural networks that try to output a reconstruction of the input. In most cases the input is reconstructed after having its dimensionality reduced. Dimensionality reduction will be our main focus; however, autoencoders can also be used to achieve “overcomplete” representations (for more stable decomposition), which actually increases dimensions.

In dimensionality reduction we wish to translate each vector of data with size n to a vector with size m, where m < n, while trying to keep as much important information as possible. One very common way to do that is using principal component analysis (PCA), where we can represent each original data column xj (all data points corresponding to an original feature) with some linear combination of the new reduced features, called the principal components, such that xj = Σi=1mwibi.

PCA, however, is limited to only linear transformation of the data vectors.

Autoencoders are more general compressors, allowing complicated nonlinear transformations and finding nontrivial relations between visible and hidden units (in fact, PCA is like a one-layer “linear autoencoder”). The weights of the models are learned automatically by reducing a given loss function with an optimizer (SGD, for example).

Autoencoders that reduce input dimensionality create a bottleneck layer called a hidden layer that has a smaller number of units than the input layer, forcing the data to be represented in a lower dimension (Figure 7-6) before it is reconstructed. For the reconstruction (decoding), autoencoders extract representative features that capture some hidden abstractions, like the shape of an eye, wheel of a car, type of sport, etc., with which we can reconstruct the original input.

Figure 7-6. Illustration of an autoencoder—a typical autoencoder will have input and output layers consisting of the same number of units, and bottleneck hidden layers, where the dimensionality of the data is reduced (compressed).  

Like some of the models we’ve seen so far, autoencoder networks can have layers stacked on top of each other, and they can include convolutions as in CNNs.

Autoencoders are currently not very suitable for real-world data compression problems due to their data specificity—they are best used on data that is similar to what they were trained on. Their current practical applications are mostly for extracting lower-dimensional representations, denoising data, and data visualization with reduced dimensionality. Denoising works because the network learns the important abstractions of the image, while losing unimportant image-specific signals like noise.

Now let’s build a toy CNN autoencoder with Keras. In this example we will train the autoencoder on one category of a noisy version of the CIFAR10 data images, and then use it to denoise a test set of the same category. In this example we will use the functional model API.

First we load the images by using Keras, and then we choose only the images that correspond to the label 1 (the automobile class):

from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras.callbacks import TensorBoard, ModelCheckpoint
from keras.datasets import cifar10
import numpy as np

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train[np.where(y_train==1)[0],:,:,:]
x_test = x_test[np.where(y_test==1)[0],:,:,:]

Next we do a little pre-processing, by first converting our data to float32 and then normalizing it to a range between [0,1]. This normalization will allow us to perform an element-wise comparison at the pixel level, as we will see shortly. First, the type conversion:

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

We then add some Gaussian noise to create the noisy dataset, and clip values that are either smaller than 0 or larger than 1:

x_train_n = x_train + 0.5 *
 np.random.normal(loc=0.0, scale=0.4, size=x_train.shape) 

x_test_n = x_test + 0.5 *
 np.random.normal(loc=0.0, scale=0.4, size=x_test.shape) 

x_train_n = np.clip(x_train_n, 0., 1.)
x_test_n = np.clip(x_test_n, 0., 1.)

Now we declare the input layer (every image in the CIFAR10 dataset is 32×32 pixels with RGB channels):

inp_img = Input(shape=(32, 32, 3))   

Next, we start adding our usual “LEGO brick" layers.  Our first layer is a 2D convolution layer, where the first argument is the number of filters (and thus the number of output images), and the second is the size of each filter. Like the other libraries, Keras automatically identifies the shape of the input.

We use a 2×2 pooling layer, which reduces the total number of pixels per channel by 4, creating the desired bottleneck. After another convolutional layer, we regain the same number of units for each channel by applying an up-sampling layer. This is done by quadrupling each pixel in a pixel’s near vicinity (repeating the rows and columns of the data) to get back the same number of pixels in each image.

Finally, we add a convolutional output layer where we go back to three channels:

img = Conv2D(32, (3, 3), activation='relu', padding='same')(inp_img)
img = MaxPooling2D((2, 2), padding='same')(img)
img = Conv2D(32, (3, 3), activation='relu', padding='same')(img)
img = UpSampling2D((2, 2))(img)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(img)

We declare the functional model format, passing both inputs and outputs:

autoencoder = Model(inp_img, decoded)

Next we compile the model, defining the loss function and the optimizer—in this case we use the Adagrad optimizer (just to show another example!). For denoising of the images, we want our loss to capture the discrepancy between the decoded images and the original, pre-noise images. For that we use a binary cross-entropy loss, comparing each decoded pixel to its corresponding original one (it’s now between [0,1]):

autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

After the model is defined, we fit it with 10 training epochs:

tensorboard = TensorBoard(log_dir='<some_path>',
              histogram_freq=0, write_graph=True, write_images=True)
model_saver = ModelCheckpoint(
                    filepath='<some_path>',
                     verbose=0, period=2)

autoencoder.fit(x_train_n, x_train,
                epochs=10,
                batch_size=64,
                shuffle=True,
                validation_data=(x_test_n, x_test),
                callbacks=[tensorboard, model_saver])

Hopefully the model will capture some internal structure, which it can later generalize to other noisy images, and denoise them as a result.

We use our test set as validation data for loss evaluation at the end of each epoch (the model will not be trained on this data), and also for visualization in TensorBoard. In addition to the TensorBoard callback, we add a model saver callback and set it to save our weights every two epochs.

Later, when we wish to load our weights, we need to reconstruct the network and then use the Model.load_weights() method, passing our model as the first argument and our saved weights file path as the second (more on saving models in Chapter 10):

inp_img = Input(shape=(32, 32, 3)) 
img = Conv2D(32, (3, 3), activation='relu', padding='same')(inp_img)
img = MaxPooling2D((2, 2), padding='same')(img)
img = Conv2D(32, (3, 3), activation='relu', padding='same')(img)
img = UpSampling2D((2, 2))(img)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(img)

autoencoder = Model(inp_img, decoded)
Model.load_weights(autoencoder,'some_path')

h5py requirement

For model saving, it is required that the h5py package is installed. This package is primarily used for storing large amounts of data and manipulating it from NumPy. You can install it using pip:

pip install h5py

Figure 7-7 shows the denoised test images of our chosen category for different numbers of training epochs.

Figure 7-7. Noisy CIFAR10 images before autoencoding (upper row) and after autoencoding (lower rows). The 4 bottom rows show results after increasing number of training epochs.

Keras also has a bunch of pretrained models to download, like inception, vgg, and resnet. In the next and final section of this chapter, we will discuss these models and show an example of how to download and use a pretrained VGG model for classification using the TF-Slim extension.

Pretrained models with TF-Slim

In this section of the chapter we will introduce the last abstraction to be covered here, TF-Slim. TF-Slim stands out by offering simplified syntax for defining convolutional neural networks in TensorFlow—its abstractions make it easy to build complex networks in a clean, streamlined manner. Like Keras, it also offers a nice variety of pretrained CNN models to download and use.

We start this section by learning about some of the general features and benefits of TF-Slim, and why it’s a great tool to use for building CNNs. In the second part of this section we will demonstrate how to download and deploy a pretrained model (VGG) for image classification.

TF-Slim

TF-Slim is a relatively new lightweight extension of TensorFlow that, like other abstractions, allows us to define and train complex models quickly and intuitively. TF-Slim doesn’t require any installation since it’s been merged with TensorFlow.

This extension is all about convolutional neural networks. CNNs are notorious for having a lot of messy boilerplate code. TF-Slim was designed with the goal of optimizing the creation of very complex CNN models so that they could be elegantly written and easy to interpret and debug by using high-level layers, variable abstractions, and argument scoping, which we will touch upon shortly.

In addition to enabling us to create and train our own models, TF-Slim has available pretrained networks that can be easily downloaded, read, and used: VGG, AlexNet, Inception, and more.

We start this section by briefly describing some of TF-Slim’s abstraction features. Then we shift our focus to how to download and use a pretrained model, demonstrating it for the VGG image classification model.

Creating CNN models with TF-Slim

With TF-Slim we can create a variable easily by defining its initialization, regularization, and device with one wrapper. For example, here we define weights initialized from a truncated normal distribution using L2 regularization and placed on the CPU (we will talk about distributing model parts across devices in Chapter 9):

import tensorflow as tf
from tensorflow.contrib import slim

W = slim.variable('w',shape=[7, 7, 3 , 3],
                         initializer=tf.truncated_normal_initializer(stddev=0.1),
                         regularizer=slim.l2_regularizer(0.07),
                         device='/CPU:0')

Like the other abstractions we’ve seen in this chapter, TF-Slim can reduce a lot of boilerplate code and redundant duplication. As with Keras or TFLearn, we can define a layer operation at an abstract level to include the convolution operation, weights initialization, regularization, activation function, and more in a single command:

net = slim.conv2d(inputs, 64, [11, 11], 4, padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0007), scope='conv1')

TF-Slim extends its elegance even beyond that, providing a clean way to replicate layers compactly by using the repeat, stack, and arg_scope commands.

repeat saves us the need to copy and paste the same line over and over so that, for example, instead of having this redundant duplication:

net = slim.conv2d(net, 128, [3, 3], scope='con1_1')
net = slim.conv2d(net, 128, [3, 3], scope='con1_2')
net = slim.conv2d(net, 128, [3, 3], scope='con1_3')
net = slim.conv2d(net, 128, [3, 3], scope='con1_4')
net = slim.conv2d(net, 128, [3, 3], scope='con1_5')

we could just enter this:

net = slim.repeat(net, 5, slim.conv2d, 128, [3, 3], scope='con1')

But this is viable only in cases where we have layers of the same size. When this does not hold, we can use the stack command, allowing us to concatenate layers of different shapes. So, instead of this:

net = slim.conv2d(net, 64, [3, 3], scope='con1_1')
net = slim.conv2d(net, 64, [1, 1], scope='con1_2')
net = slim.conv2d(net, 128, [3, 3], scope='con1_3')
net = slim.conv2d(net, 128, [1, 1], scope='con1_4')
net = slim.conv2d(net, 256, [3, 3], scope='con1_5')

we can write this:

slim.stack(net, slim.conv2d, [(64, [3, 3]), (64, [1, 1]), 
                              (128, [3, 3]), (128, [1, 1]),
                              (256, [3, 3])], scope='con')

Finally, we also have a scoping mechanism referred to as arg_scope, allowing users to pass a set of shared arguments to each operation defined in the same scope. Say, for example, that we have four layers having the same activation function, initialization, regularization, and padding. We can then simply use the slim.arg_scope command, where we specify the shared arguments as in the following code:

with slim.arg_scope([slim.conv2d], 
                 padding='VALID',
                 activation_fn=tf.nn.relu,
                 weights_initializer=tf.truncated_normal_initializer(stddev=0.02)
                 weights_regularizer=slim.l2_regularizer(0.0007)):
  net = slim.conv2d(inputs, 64, [11, 11], scope='con1')
  net = slim.conv2d(net, 128, [11, 11], padding='VALID', scope='con2')
  net = slim.conv2d(net, 256, [11, 11], scope='con3')
  net = slim.conv2d(net, 256, [11, 11], scope='con4')

The individual arguments inside the arg_scope command can still be overwritten, and we can also nest one arg_scope inside another.

In these examples we used conv2d(): however, TF-Slim has many of the other standard methods for building neural networks. Table 7-4 lists some of the available options. For the full list, consult the documentation.

Table 7-4. Available layer types in TF-Slim
Layer TF-Slim
BiasAdd slim.bias_add()
BatchNorm slim.batch_norm()
Conv2d slim.conv2d()
Conv2dInPlane slim.conv2d_in_plane()
Conv2dTranspose (Deconv) slim.conv2d_transpose()
FullyConnected slim.fully_connected()
AvgPool2D slim.avg_pool2d()
Dropout slim.dropout()
Flatten slim.flatten()
MaxPool2D slim.max_pool2d()
OneHotEncoding slim.one_hot_encoding()
SeparableConv2 slim.separable_conv2d()
UnitNorm slim.unit_norm

To illustrate how convenient TF-Slim is for creating complex CNNs, we will build the VGG model by Karen Simonyan and Andrew Zisserman that was introduced in 2014 (see the upcoming note for more information). VGG serves as a good illustration of how a model with many layers can be created compactly using TF-Slim. Here we construct the 16-layer version: 13 convolution layers plus 3 fully connected layers.

Creating it, we take advantage of two of the features we’ve just mentioned:

  1. We use the arg_scope feature since all of the convolution layers have the same activation function and the same regularization and initialization.
  2. Many of the layers are exact duplicates of others, and therefore we also take advantage of the repeat command. 

The result very compelling—the entire model is defined with just 16 lines of code:

  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                activation_fn=tf.nn.relu,
                weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='con1')
    net = slim.max_pool2d(net, [2, 2], scope='pool1')
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='con2')
    net = slim.max_pool2d(net, [2, 2], scope='pool2')
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='con3')
    net = slim.max_pool2d(net, [2, 2], scope='pool3')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='con4')
    net = slim.max_pool2d(net, [2, 2], scope='pool4')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='con5')
    net = slim.max_pool2d(net, [2, 2], scope='pool5')
    net = slim.fully_connected(net, 4096, scope='fc6')
    net = slim.dropout(net, 0.5, scope='dropout6')
    net = slim.fully_connected(net, 4096, scope='fc7')
    net = slim.dropout(net, 0.5, scope='dropout7')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')

VGG and the ImageNet Challenge

The ImageNet project is a large database of images collected for the purpose of researching visual object recognition. As of 2016 it contained over 10 million hand-annotated images.

Each year (since 2010) a competition takes place called the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where research teams try to automatically classify, detect, and localize objects and scenes in a subset of the ImageNet collection. In the 2012 challenge, dramatic progress occurred when a deep convolutional neural net called AlexNet, created by Alex Krizhevsky, managed to get a top 5 (top 5 chosen categories) classification error of only 15.4%, winning the competition by a large margin.

Over the next couple of years the error rate kept falling, from ZFNet with 14.8% in 2013, to GoogLeNet (introducing the Inception module) with 6.7% in 2014, to ResNet with 3.6% in 2015.  The Visual Geometry Group (VGG) was another CNN competitor in the 2014 competition that also achieved an impressive low error rate (7.3%). A lot of people prefer VGG over GoogLeNet because it has a nicer, simpler architecture.

In VGG the only spatial dimensions used are very small 3×3 filters with a stride of 1 and a 2×2 max pooling, again with a stride of 1. Its superiority is achieved by the number of layers it uses, which is between 16 and 19.

Downloading and using a pretrained model

Next we will demonstrate how to download and deploy a pretrained VGG model.

First we need to clone the repository where the actual models will reside by running:

git clone https://github.com/tensorflow/models

Now we have the scripts we need for modeling on our computer, and we can use them by setting the path:

import sys
sys.path.append("<some_path> + models/slim")

Next we will download the pretrained VGG-16 (16 layers) model—it is available on GitHub, as are other models, such as Inception, ResNet, and more:

from datasets import dataset_utils
import tensorflow as tf
target_dir = '<some_path> + vgg/vgg_checkpoints'

The downloaded checkpoint file contains information about both the model and the variables. Now we want to load it and use it for classification of new images.

However, before that we first have to prepare our input image, turning it into a readable TensorFlow format and performing a little pre-processing to make sure that it is resized to match the size of the images the model was trained on.

We can load the image into TensorFlow either as a URL link or as a desktop image. For a URL link, we can load the image as a string with urllib2 (this needs to be imported), and then decode it into a Tensor by using tf.image_decode_jpeg():

import urllib2

url = ("https://somewebpage/somepicture.jpg")
im_as_string = urllib2.urlopen(url).read()  
im = tf.image.decode_jpeg(im_as_string, channels=3)

Or, for PNG:

im = tf.image.decode_png(im_as_string, channels=3)

To load an image from our computer, we can create a queue of our filenames in the target directory, and then read the entire image file by using tf.WholeFileReader():

filename_queue = tf.train.string_input_producer(
                        tf.train.match_filenames_once("./images/*.jpg"))
image_reader = tf.WholeFileReader()
_, image_file = image_reader.read(filename_queue)
image = tf.image.decode_jpeg(image_file)

Don’t worry about the details for this step; we will discuss queues and reading data in much more depth in Chapter 8.

Next we want to resize the image so that it matches the size of the images VGG was trained on. For that, we first extract the desired size from the VGG script (in this case, it is 224):

from nets import vgg
image_size = vgg.vgg_16.default_image_size

Then we feed the raw image and the image size to the VGG pre-processing unit, where the image will be resized with a preserved aspect ratio (the width-to-height ratio of the image) and then cropped:

from preprocessing import vgg_preprocessing
processed_im = vgg_preprocessing.preprocess_image(image,
                                                         image_size,
                                                         image_size,
                                                         is_training=False)

Next we use tf.expand_dims() to insert a dimension of 1 into a tensor’s shape. This is done to add a batch dimension to a single element (changing [height, width, channels] to [1, height, width, channels]):

processed_images  = tf.expand_dims(processed_im, 0)

Now we create the model from the script we cloned earlier. We pass the model function the images and number of classes. The model has shared arguments; therefore, we call it using arg_scope, as we saw earlier, and use the vgg_arg_scope() function in the script to define the shared arguments. The function is shown in the following code snippet.

vgg_16() returns the logits (numeric values acting as evidence for each class), which we can then turn into probabilities by using tf.nn.softmax(). We use the argument is_training to indicate that we are interested in forming predictions rather than training:

with slim.arg_scope(vgg.vgg_arg_scope()):
     logits, _ = vgg.vgg_16(processed_images,
                            num_classes=1000,
                             is_training=False)
probabilities = tf.nn.softmax(logits)
def vgg_arg_scope(weight_decay=0.0005):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                     activation_fn=tf.nn.relu,
                     weights_regularizer=slim.l2_regularizer(weight_decay),
                     biases_initializer=tf.zeros_initializer):
    with slim.arg_scope([slim.conv2d], padding='SAME') as arg_sc:
      return arg_sc

Now, just before starting the session, we need to load the variables we downloaded using slim.assign_from_checkpoint_fn(), to which we pass the containing directory:

import os

load_vars = slim.assign_from_checkpoint_fn(
     os.path.join(target_dir, 'vgg_16.ckpt'),
     slim.get_model_variables('vgg_16'))

Finally, the main event—we run the session, load the variables, and feed in the images and the desired probabilities.

We can get the class names by using the following lines:

from datasets import imagenet
imagenet.create_readable_names_for_imagenet_labels()

We extract the five classes with the highest probabilities for our given image, and the probabilities as well:

names = []
with tf.Session() as sess:
    load_vars(sess)     
    network_input, probabilities = sess.run([processed_images,
                                             probabilities])
    probabilities = probabilities[0, 0:]
    names_ = imagenet.create_readable_names_for_imagenet_labels()
    idxs = np.argsort(-probabilities)[:5]
    probs = probabilities[idxs]
    classes = np.array(names_.values())[idxs+1]
    for c,p in zip(classes,probs):
        print('Class: '+ c + ' |Prob: ' + str(p))

In this example we passed the image shown in Figure 7-8 as input to the pretrained VGG model.

Figure 7-8. A lakeside in Switzerland.

Here are the output results for the top-five chosen classes and their probabilities:

Output:
Class: lakeside, lakeshore |Prob: 0.365693
Class: pelican |Prob: 0.163627
Class: dock, dockage, docking facility |Prob: 0.0608374
Class: breakwater, groin, groyne, mole, bulwark, seawall, jetty |Prob: 0.0393285
Class: speedboat |Prob: 0.0391587

As you can see, the classifier does quite well at capturing different elements in this image.

Summary

We started this chapter by discussing the importance of abstractions, followed by high-level coverage and then focusing in on some of the popular TensorFlow extensions: contrib.learn, TFLearn, Keras, and TF-Slim. We revisited models from previous chapters, using out-of-the-box contrib.learn linear regression and linear classification models. We then saw how to use the FeatureColumn abstraction for feature handling and pre-processing, incorporate TensorBoard, and create our own custom estimator. We introduced TFLearn and exemplified how easily CNN and RNN models can be constructed with it. Using Keras, we demonstrated how to implement an autoencoder. Finally, we created complex CNN models with TF-Slim and deployed a pretrained model.

In the next chapters we cover scaling up, with queuing and threading, distributed computing, and model serving.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset