In this short section we will focus on two topics that continue from and extend the previous chapters—how to construct a proper model, and how to customize the model’s entities. We start by describing how we can effectively reframe our code by using encapsulations and allow its variables to be shared and reused. In the second part of this section we will talk about how to customize our own loss functions and operations and use them for optimization.
Ultimately, we would like to design our TensorFlow code efficiently, so that it can be reused for multiple tasks and is easy to follow and pass around. One way to make things cleaner is to use one of the available TensorFlow extension libraries, which were discussed in Chapter 7. However, while they are great to use for typical networks, models with new components that we wish to implement may sometimes require the full flexibility of lower-level TensorFlow.
Let’s take another look at the optimization code from the previous chapter:
import
tensorflow
as
tf
NUM_STEPS
=
10
g
=
tf
.
Graph
()
wb_
=
[]
with
g
.
as_default
():
x
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
[
None
,
3
])
y_true
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
None
)
with
tf
.
name_scope
(
'inference'
)
as
scope
:
w
=
tf
.
Variable
([[
0
,
0
,
0
]],
dtype
=
tf
.
float32
,
name
=
'weights'
)
b
=
tf
.
Variable
(
0
,
dtype
=
tf
.
float32
,
name
=
'bias'
)
y_pred
=
tf
.
matmul
(
w
,
tf
.
transpose
(
x
))
+
b
with
tf
.
name_scope
(
'loss'
)
as
scope
:
loss
=
tf
.
reduce_mean
(
tf
.
square
(
y_true
-
y_pred
))
with
tf
.
name_scope
(
'train'
)
as
scope
:
learning_rate
=
0.5
optimizer
=
tf
.
train
.
GradientDescentOptimizer
(
learning_rate
)
train
=
optimizer
.
minimize
(
loss
)
init
=
tf
.
global_variables_initializer
()
with
tf
.
Session
()
as
sess
:
sess
.
run
(
init
)
for
step
in
range
(
NUM_STEPS
):
sess
.
run
(
train
,{
x
:
x_data
,
y_true
:
y_data
})
if
(
step
%
5
==
0
):
(
step
,
sess
.
run
([
w
,
b
]))
wb_
.
append
(
sess
.
run
([
w
,
b
]))
(
10
,
sess
.
run
([
w
,
b
]))
We get:
(0, [array([[ 0.30149955, 0.49303722, 0.11409992]], dtype=float32), -0.18563795]) (5, [array([[ 0.30094019, 0.49846715, 0.09822173]], dtype=float32), -0.19780949]) (10, [array([[ 0.30094025, 0.49846718, 0.09822182]], dtype=float32), -0.19780946])
The entire code here is simply stacked line by line. This is OK for simple and focused examples. However, this way of coding has its limits—it’s neither reusable nor very readable when the code gets more complex.
Let’s zoom out and think about what characteristics our infrastructure should have. First, we would like to encapsulate the model so it can be used for various tasks like training, evaluation, and forming predictions. Furthermore, it can also be more efficient to construct the model in a modular fashion, giving us specific control over its subcomponents and increasing readability. This will be the focus of the next few sections.
A good start is to split the code into functions that capture different elements in the learning model. We can do this as follows:
def
predict
(
x
,
y_true
,
w
,
b
):
y_pred
=
tf
.
matmul
(
w
,
tf
.
transpose
(
x
))
+
b
return
y_pred
def
get_loss
(
y_pred
,
y_true
):
loss
=
tf
.
reduce_mean
(
tf
.
square
(
y_true
-
y_pred
))
return
loss
def
get_optimizer
(
y_pred
,
y_true
):
loss
=
get_loss
(
y_pred
,
y_true
)
optimizer
=
tf
.
train
.
GradientDescentOptimizer
(
0.5
)
train
=
optimizer
.
minimize
(
loss
)
return
train
def
run_model
(
x_data
,
y_data
):
wb_
=
[]
# Define placeholders and variables
x
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
[
None
,
3
])
y_true
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
None
)
w
=
tf
.
Variable
([[
0
,
0
,
0
]],
dtype
=
tf
.
float32
)
b
=
tf
.
Variable
(
0
,
dtype
=
tf
.
float32
)
(
b
.
name
)
# Form predictions
y_pred
=
predict
(
x
,
y_true
,
w
,
b
)
# Create optimizer
train
=
get_optimizer
(
y_pred
,
y_data
)
# Run session
init
=
tf
.
global_variables_initializer
()
with
tf
.
Session
()
as
sess
:
sess
.
run
(
init
)
for
step
in
range
(
10
):
sess
.
run
(
train
,{
x
:
x_data
,
y_true
:
y_data
})
if
(
step
%
5
==
0
):
(
step
,
sess
.
run
([
w
,
b
]))
wb_
.
append
(
sess
.
run
([
w
,
b
]))
run_model
(
x_data
,
y_data
)
run_model
(
x_data
,
y_data
)
And here is the result:
Variable_9:0 Variable_8:0 0 [array([[ 0.27383861, 0.48421991, 0.09082422]], dtype=float32), -0.20805186] 4 [array([[ 0.29868397, 0.49840903, 0.10026278]], dtype=float32), -0.20003076] 9 [array([[ 0.29868546, 0.49840906, 0.10026464]], dtype=float32), -0.20003042] Variable_11:0 Variable_10:0 0 [array([[ 0.27383861, 0.48421991, 0.09082422]], dtype=float32), -0.20805186] 4 [array([[ 0.29868397, 0.49840903, 0.10026278]], dtype=float32), -0.20003076] 9 [array([[ 0.29868546, 0.49840906, 0.10026464]], dtype=float32), -0.20003042]
Now we can reuse the code with different inputs, and this division makes it easier to read, especially when it gets more complex.
In this example we called the main function twice with the same inputs and printed the variables that were created. Note that each call created a different set of variables, resulting in the creation of four variables. Let’s assume, for example, a scenario where we wish to build a model with multiple inputs, such as two different images. Say we wish to apply the same convolutional filters to both input images. New variables will be created. To avoid this, we “share” the filter variables, using the same variables on both images.
It’s possible to reuse the same variables by creating them with tf.get_variable()
instead of tf.Variable()
. We use this very similarly to tf.Variable()
, except that we need to pass an initializer as an argument:
w
=
tf
.
get_variable
(
'w'
,[
1
,
3
],
initializer
=
tf
.
zeros_initializer
())
b
=
tf
.
get_variable
(
'b'
,[
1
,
1
],
initializer
=
tf
.
zeros_initializer
())
Here we used tf.zeros_initializer()
. This initializer is very similar to tf.zeros()
, except that it doesn’t get the shape as an argument, but rather arranges the values according to the shape specified by tf.get_variable()
.
In this example the variable w
will be initialized as [0,0,0]
, as specified by the given shape, [1,3]
.
With get_variable()
we can reuse variables that have the same name (including the scope prefix, which can be set by tf.variable_scope()
). But first we need to indicate this intention by either using tf.variable_scope.reuse_variable()
or setting the reuse
flag (tf.variable.scope(reuse=True)
). An example of how to share variables is shown in the code that follows.
Whenever a variable has the exact same name as another, an exception will be thrown when the reuse
flag is not set. The same goes for the opposite scenario—variables with mismatching names that are expected to be reused (when reuse = True
) will cause an exception as well.
Using these methods, and setting the scope prefix to Regression
, by printing their names we can see that the same variables are reused:
def
run_model
(
x_data
,
y_data
):
wb_
=
[]
# Define placeholders and variables
x
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
[
None
,
3
])
y_true
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
None
)
w
=
tf
.
get_variable
(
'w'
,[
1
,
3
],
initializer
=
tf
.
zeros_initializer
())
b
=
tf
.
get_variable
(
'b'
,[
1
,
1
],
initializer
=
tf
.
zeros_initializer
())
(
b
.
name
,
w
.
name
)
# Form predictions
y_pred
=
predict
(
x
,
y_true
,
w
,
b
)
# Create optimizer
train
=
get_optimizer
(
y_pred
,
y_data
)
# Run session
init
=
tf
.
global_variables_initializer
()
sess
.
run
(
init
)
for
step
in
range
(
10
):
sess
.
run
(
train
,{
x
:
x_data
,
y_true
:
y_data
})
if
(
step
%
5
==
4
)
or
(
step
==
0
):
(
step
,
sess
.
run
([
w
,
b
]))
wb_
.
append
(
sess
.
run
([
w
,
b
]))
sess
=
tf
.
Session
()
with
tf
.
variable_scope
(
"Regression"
)
as
scope
:
run_model
(
x_data
,
y_data
)
scope
.
reuse_variables
()
run_model
(
x_data
,
y_data
)
sess
.
close
()
The output is shown here:
Regression/b:0 Regression/w:0 0 [array([[ 0.27383861, 0.48421991, 0.09082422]], dtype=float32), array([[-0.20805186]], dtype=float32)] 4 [array([[ 0.29868397, 0.49840903, 0.10026278]], dtype=float32), array([[-0.20003076]], dtype=float32)] 9 [array([[ 0.29868546, 0.49840906, 0.10026464]], dtype=float32), array([[-0.20003042]], dtype=float32)] Regression/b:0 Regression/w:0 0 [array([[ 0.27383861, 0.48421991, 0.09082422]], dtype=float32), array([[-0.20805186]], dtype=float32)] 4 [array([[ 0.29868397, 0.49840903, 0.10026278]], dtype=float32), array([[-0.20003076]], dtype=float32)] 9 [array([[ 0.29868546, 0.49840906, 0.10026464]], dtype=float32), array([[-0.20003042]], dtype=float32)]
tf.get_variables()
is a neat, lightweight way to share variables. Another approach is to encapsulate our model as a class and manage the variables there. This approach has many other benefits, as described in the following section
As with any other program, when things get more complex and the number of code lines grows, it becomes very convenient to have our TensorFlow code reside within a class, giving us quick access to methods and attributes that belong to the same model. Class encapsulation allows us to maintain the state of our variables and then perform various post-training tasks like forming predictions, model evaluation, further training, saving and restoring our weights, and whatever else is related to the specific problem our model solves.
In the next batch of code we see an example of a simple class wrapper. The model is created when the instance is instantiated, and the training process is performed by calling the fit()
method.
This code uses a @property
decorator. A decorator is simply a function that takes another function as input, does something with it (like adding some functionality), and returns it. In Python, a decorator is defined with the @
symbol.
@property
is a decorator used to handle access to class attributes.
Our class wrapper is as follows:
class
Model
:
def
__init__
(
self
):
# Model
self
.
x
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
[
None
,
3
])
self
.
y_true
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
None
)
self
.
w
=
tf
.
Variable
([[
0
,
0
,
0
]],
dtype
=
tf
.
float32
)
self
.
b
=
tf
.
Variable
(
0
,
dtype
=
tf
.
float32
)
init
=
tf
.
global_variables_initializer
()
self
.
sess
=
tf
.
Session
()
self
.
sess
.
run
(
init
)
self
.
_output
=
None
self
.
_optimizer
=
None
self
.
_loss
=
None
def
fit
(
self
,
x_data
,
y_data
):
(
self
.
b
.
name
)
for
step
in
range
(
10
):
self
.
sess
.
run
(
self
.
optimizer
,{
self
.
x
:
x_data
,
self
.
y_true
:
y_data
})
if
(
step
%
5
==
4
)
or
(
step
==
0
):
(
step
,
self
.
sess
.
run
([
self
.
w
,
self
.
b
]))
@property
def
output
(
self
):
if
not
self
.
_output
:
y_pred
=
tf
.
matmul
(
self
.
w
,
tf
.
transpose
(
self
.
x
))
+
self
.
b
self
.
_output
=
y_pred
return
self
.
_output
@property
def
loss
(
self
):
if
not
self
.
_loss
:
error
=
tf
.
reduce_mean
(
tf
.
square
(
self
.
y_true
-
self
.
output
))
self
.
_loss
=
error
return
self
.
_loss
@property
def
optimizer
(
self
):
if
not
self
.
_optimizer
:
opt
=
tf
.
train
.
GradientDescentOptimizer
(
0.5
)
opt
=
opt
.
minimize
(
self
.
loss
)
self
.
_optimizer
=
opt
return
self
.
_optimizer
lin_reg
=
Model
()
lin_reg
.
fit
(
x_data
,
y_data
)
lin_reg
.
fit
(
x_data
,
y_data
)
And we get this:
Variable_89:0 0 [array([[ 0.32110521, 0.4908163 , 0.09833425]], dtype=float32), -0.18784374] 4 [array([[ 0.30250472, 0.49442694, 0.10041162]], dtype=float32), -0.1999902] 9 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999036] Variable_89:0 0 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999038] 4 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999038] 9 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999036]
Splitting the code into functions is somewhat redundant in the sense that the same lines of code are recomputed with every call. One simple solution is to add a condition at the beginning of each function. In the next code iteration we will see an even nicer workaround.
In this setting there is no need to use variable sharing since the variables are kept as attributes of the model object. Also, after calling the training method model.fit()
twice, we see that the variables have maintained their current state.
In our last batch of code for this section we add another enhancement, creating a custom decorator that automatically checks whether the function was already called.
Another improvement we can make is having all of our variables kept in a dictionary. This will allow us to keep track of our variables after each operation, as we saw in Chapter 10 when we looked at saving weights and models.
Finally, additional functions for getting the values of the loss function and our weights are added:
class
Model
:
def
__init__
(
self
):
# Model
self
.
x
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
[
None
,
3
])
self
.
y_true
=
tf
.
placeholder
(
tf
.
float32
,
shape
=
None
)
self
.
params
=
self
.
_initialize_weights
()
init
=
tf
.
global_variables_initializer
()
self
.
sess
=
tf
.
Session
()
self
.
sess
.
run
(
init
)
self
.
output
self
.
optimizer
self
.
loss
def
_initialize_weights
(
self
):
params
=
dict
()
params
[
'w'
]
=
tf
.
Variable
([[
0
,
0
,
0
]],
dtype
=
tf
.
float32
)
params
[
'b'
]
=
tf
.
Variable
(
0
,
dtype
=
tf
.
float32
)
return
params
def
fit
(
self
,
x_data
,
y_data
):
(
self
.
params
[
'b'
]
.
name
)
for
step
in
range
(
10
):
self
.
sess
.
run
(
self
.
optimizer
,{
self
.
x
:
x_data
,
self
.
y_true
:
y_data
})
if
(
step
%
5
==
4
)
or
(
step
==
0
):
(
step
,
self
.
sess
.
run
([
self
.
params
[
'w'
],
self
.
params
[
'b'
]]))
def
evaluate
(
self
,
x_data
,
y_data
):
(
self
.
params
[
'b'
]
.
name
)
MSE
=
self
.
sess
.
run
(
self
.
loss
,{
self
.
x
:
x_data
,
self
.
y_true
:
y_data
})
return
MSE
def
getWeights
(
self
):
return
self
.
sess
.
run
([
self
.
params
[
'b'
]])
@property_with_check
def
output
(
self
):
y_pred
=
tf
.
matmul
(
self
.
params
[
'w'
],
tf
.
transpose
(
self
.
x
))
+
self
.
params
[
'b'
]
return
y_pred
@property_with_check
def
loss
(
self
):
error
=
tf
.
reduce_mean
(
tf
.
square
(
self
.
y_true
-
self
.
output
))
return
error
@property_with_check
def
optimizer
(
self
):
opt
=
tf
.
train
.
GradientDescentOptimizer
(
0.5
)
opt
=
opt
.
minimize
(
self
.
loss
)
return
opt
lin_reg
=
Model
()
lin_reg
.
fit
(
x_data
,
y_data
)
MSE
=
lin_reg
.
evaluate
(
x_data
,
y_data
)
(
MSE
)
(
lin_reg
.
getWeights
())
Here is the output:
Variable_87:0 0 [array([[ 0.32110521, 0.4908163 , 0.09833425]], dtype=float32), -0.18784374] 4 [array([[ 0.30250472, 0.49442694, 0.10041162]], dtype=float32), -0.1999902] 9 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999036] Variable_87:0 0 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999038] 4 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999038] 9 [array([[ 0.30250433, 0.49442688, 0.10041161]], dtype=float32), -0.19999036] Variable_87:0 0.0102189 [-0.19999036]
The custom decorator checks whether an attribute exists, and if not, it sets it according to the input function. Otherwise, it returns the attribute. functools.wrap()
is used so we can reference the name of the function:
import
functools
def
property_with_check
(
input_fn
):
attribute
=
'_cache_'
+
input_fn
.
__name__
@property
@functools.wraps
(
input_fn
)
def
check_attr
(
self
):
if
not
hasattr
(
self
,
attribute
):
setattr
(
self
,
attribute
,
input_fn
(
self
))
return
getattr
(
self
,
attribute
)
return
check_attr
This was a fairly basic example of how we can improve the overall code for our model. This kind of optimization might be overkill for our simple linear regression example, but it will definitely be worth the effort for complicated models with plenty of layers, variables, and features.
So far we’ve used two loss functions. In the classification example in Chapter 2 we used the cross-entropy loss, defined as follows:
cross_entropy
=
tf
.
reduce_mean
(
tf
.
nn
.
softmax_cross_entropy_with_logits
(
logits
=
y_pred
,
labels
=
y_true
))
In contrast, in the regression example in the previous section we used the square error loss, defined as follows:
loss
=
tf
.
reduce_mean
(
tf
.
square
(
y_true
-
y_pred
))
These are the most commonly used loss functions in machine learning and deep learning right now. The purpose of this section is twofold. First, we want to point out the more general capabilities of TensorFlow in utilizing custom loss functions. Second, we will discuss regularization as a form of extension of any loss function in order to achieve a specific goal, irrespective of the basic loss function used.
This book (and presumably our readers) takes a specific view of TensorFlow with the aspect of deep learning in mind. However, TensorFlow is more general in scope, and most machine learning problems can be formulated in a way that TensorFlow can be used to solve. Furthermore, any computation that can be formulated in the computation graph framework is a good candidate to benefit from TensorFlow.
The predominant special case is the class of unconstrained optimization problems. These are extremely common throughout scientific (and algorithmic) computing, and for these, TensorFlow is especially helpful. The reason these problems stand out is that TensorFlow provides an automatic mechanism for computing gradients, which affords a tremendous speedup in development time for such problems.
In general, optimization with respect to an arbitrary loss function will be in the form
def
my_loss_function
(
key
-
variables
...
):
loss
=
...
return
loss
my_loss
=
my_loss_function
(
key
-
variables
...
)
gd_step
=
tf
.
train
.
GradientDescentOptimizer
()
.
minimize
(
my_loss
)
where any optimizer could be used in place of the GradientDescentOptimizer
.
Regularization is the restriction of an optimization problem by imposing a penalty on the complexity of the solution (see the note in Chapter 4 for more details). In this section we take a look at specific instances where the penalty is directly added to the basic loss function in an additive form.
For example, building on the softmax example from Chapter 2, we have this:
x
=
tf
.
placeholder
(
tf
.
float32
,
[
None
,
784
])
W
=
tf
.
Variable
(
tf
.
zeros
([
784
,
10
]))
y_true
=
tf
.
placeholder
(
tf
.
float32
,
[
None
,
10
])
y_pred
=
tf
.
matmul
(
x
,
W
)
cross_entropy
=
tf
.
reduce_mean
(
tf
.
nn
.
softmax_cross_entropy_with_logits
(
logits
=
y_pred
,
labels
=
y_true
))
total_loss
=
cross_entropy
+
LAMBDA
*
tf
.
nn
.
l2_loss
(
W
)
gd_step
=
tf
.
train
.
GradientDescentOptimizer
(
0.5
)
.
minimize
(
total_loss
)
The difference between this and the original in Chapter 2 is that we added LAMBDA * tf.nn.l2_loss(W)
to the loss we are optimizing with respect to. In this case, using a small value of the trade-off parameter LAMBDA
will have very little effect on the resulting accuracy (a large value will be detrimental). In large networks, where overfitting is a serious issue, this sort of regularization can often be a lifesaver.
Regularization of this sort can be done with respect to the weights of the model, as shown in the previous example (also called weight decay, since it will cause the weights to have smaller values), as well as to the activations of a specific layer, or indeed all layers.
Another factor is what function we use—we could have used l1
instead of the l2
regularization, or a combination of the two. All combinations of these regularizers are valid and used in various contexts.
Many of the abstraction layers make the application of regularization as easy as specifying the number of filters, or the activation function. In Keras (a very popular extension reviewed in Chapter 7), for instance, we are provided with the regularizers listed in Table A-1, applicable to all the standard layers.
Regularizer | What it does | Example |
---|---|---|
l1 |
l1 regularization of weights |
|
l2 |
l2 regularization of weights |
|
l1l2 |
Combined l1 + l2 regularization of weights |
|
activity_l1 |
l1 regularization of activations |
|
activity_l2 |
l2 regularization of activations |
|
activity_l1l2 |
Combined l1 + l2 regularization of activations |
|
Using these shortcuts makes it easy to test different regularization schemes when a model is overfitting.
TensorFlow comes ready packed with a large number of native ops, ranging from standard arithmetic and logical operations to matrix operations, deep learning–specific functions, and more. When these are not enough, it is possible to extend the system by creating a new op. This is done in one of two ways:
We will spend the remainder of this section discussing the second option.
The main reason to construct a Python op is to utilize NumPy functionality in the context of a TensorFlow computational graph. For the sake of illustration, we will construct the regularization example from the previous section by using the NumPy multiplication function rather than the TensorFlow op:
import
numpy
as
np
LAMBDA
=
1e-5
def
mul_lambda
(
val
):
return
np
.
multiply
(
val
,
LAMBDA
)
.
astype
(
np
.
float32
)
Note that this is done for the sake of illustration, and there is no special reason why anybody would want to use this instead of the native TensorFlow op. We use this oversimplified example in order to shift the focus to the details of the mechanism rather than the computation.
In order to use our new creation from within TensorFlow, we use the py_func()
functionality:
tf
.
py_func
(
my_python_function
,
[
input
],
[
output_types
])
In our case, this means we compute the total loss as follows:
total_loss
=
cross_entropy
+
tf
.
py_func
(
mul_lambda
,
[
tf
.
nn
.
l2_loss
(
W
)],
[
tf
.
float32
])[
0
]
Doing this, however, will not be enough. Recall that TensorFlow keeps track of the gradients of each of the ops in order to perform gradient-based training of our overall model. In order for this to work with the new Python-based op, we have to specify the gradient manually. This is done in two steps.
First, we create and register the gradient:
@tf.RegisterGradient
(
"PyMulLambda"
)
def
grad_mul_lambda
(
op
,
grad
):
return
LAMBDA
*
grad
Next, when using the function, we point to this function as the gradient of the op. This is done using the string registered in the previous step:
with
tf
.
get_default_graph
()
.
gradient_override_map
({
"PyFunc"
:
"PyMulLambda"
}):
total_loss
=
cross_entropy
+
tf
.
py_func
(
mul_lambda
,
[
tf
.
nn
.
l2_loss
(
W
)],
[
tf
.
float32
])[
0
]
Putting it all together, the code for the softmax model with regularization through our new Python-based op is now:
import
numpy
as
np
import
tensorflow
as
tf
LAMBDA
=
1e-5
def
mul_lambda
(
val
):
return
np
.
multiply
(
val
,
LAMBDA
)
.
astype
(
np
.
float32
)
@tf.RegisterGradient
(
"PyMulLambda"
)
def
grad_mul_lambda
(
op
,
grad
):
return
LAMBDA
*
grad
x
=
tf
.
placeholder
(
tf
.
float32
,
[
None
,
784
])
W
=
tf
.
Variable
(
tf
.
zeros
([
784
,
10
]))
y_true
=
tf
.
placeholder
(
tf
.
float32
,
[
None
,
10
])
y_pred
=
tf
.
matmul
(
x
,
W
)
cross_entropy
=
tf
.
reduce_mean
(
tf
.
nn
.
softmax_cross_entropy_with_logits
(
logits
=
y_pred
,
labels
=
y_true
))
with
tf
.
get_default_graph
()
.
gradient_override_map
({
"PyFunc"
:
"PyMulLambda"
}):
total_loss
=
cross_entropy
+
tf
.
py_func
(
mul_lambda
,
[
tf
.
nn
.
l2_loss
(
W
)],
[
tf
.
float32
])[
0
]
gd_step
=
tf
.
train
.
GradientDescentOptimizer
(
0.5
)
.
minimize
(
total_loss
)
correct_mask
=
tf
.
equal
(
tf
.
argmax
(
y_pred
,
1
),
tf
.
argmax
(
y_true
,
1
))
accuracy
=
tf
.
reduce_mean
(
tf
.
cast
(
correct_mask
,
tf
.
float32
))
This can now be trained using the same code as in Chapter 2, when this model was first introduced.
In the simple example we just showed, the gradient depends only on the gradient with respect to the input, and not on the input itself. In the general case, we will need access to the input as well. This is done easily, using the op.input
s field:
x
=
op
.
inputs
[
0
]
In this section, we add details on some of the material covered in Chapter 10 and review in more depth some of the technical components used behind the scenes in TensorFlow Serving.
In Chapter 10, we used Docker to run TensorFlow Serving. Those who prefer to avoid using a Docker container need to have the following installed:
Bazel is Google’s own build tool, which recently became publicly available. When we use the term build, we are referring to using a bunch of rules to create output software from source code in a very efficient and reliable manner. The build process can also be used to reference external dependencies that are required to build the outputs. Among other languages, Bazel can be used to build C++ applications, and we exploit this to build the C++-written TensorFlow Serving’s programs. The source code Bazel builds upon is organized in a workspace directory inside nested hierarchies of packages, where each package groups related source files together. Every package consists of three types of files: human-written source files called targets, generated files created from the source files, and rules specifying the steps for deriving the outputs from the inputs.
Each package has a BUILD file, specifying the output to be built from the files inside that package. We use basic Bazel commands like bazel build
to build generated files from targets, and bazel run
to execute a build rule. We use the -bin
flag when we want to specify the directories to contain the build outputs.
Downloads and installation instructions can be found on the Bazel website.
Remote procedure call (RPC) is a form of client (caller)–server (executer) interaction; a program can request a procedure (for example, a method) that is executed on another computer (commonly in a shared network). gRPC is an open source framework developed by Google. Like any other RPC framework, gRPC lets you directly call methods on other machines, making it easier to distribute the computations of an application. The greatness of gRPC lies in how it handles the serialization, using the fast and efficient protocol buffers instead of XML or other methods.
Downloads and installation instructions can be found on GitHub.
Next, you need to make sure that the necessary dependencies for Serving are installed with the following command:
sudo apt-get update && sudo apt-get install -y build-essential curl libcurl3-dev git libfreetype6-dev libpng12-dev libzmq3-dev pkg-config python-dev python-numpy python-pip software-properties-common swig zip zlib1g-dev
And lastly, clone Serving:
git clone --recurse-submodules https://github.com/tensorflow/serving cd serving
As illustrated in Chapter 10, another option is to use a Docker container, allowing a simple and clean installation.
Docker is essentially solving the same problem as Vagrant with VirtualBox, and that is making sure our code will run smoothly on other machines. Different machines might have different operating systems as well as different tool sets (installed software, configurations, permissions, etc.). By replicating the same environment—maybe for production purposes, maybe just to share with others—we guarantee that our code will run exactly the same way elsewhere as on our original development machine.
What’s unique about Docker is that, unlike other similarly purposed tools, it doesn’t create a fully operational virtual machine on which the environment will be built, but rather creates a container on top of an existing system (Ubuntu, for example), acting as a virtual machine in a sense and using our existing OS resources. These containers are created from a local Docker image, which is built from a dockerfile and encapsulates everything we need (dependency installations, project code, etc.). From that image we can create as many containers as we want (until we run out of memory, of course). This makes Docker a very cool tool with which we can easily create complete multiple environment replicas that contain our code and run them anywhere (very useful for cluster computing).
To get you a bit more comfortable with using Docker, here’s a quick look at some useful commands, written in their most simplified form. Given that we have a dockerfile ready, we can build an image by using docker build <dockerfile>
. From that image we can then create a new container by using the docker run <image>
command. This command will also automatically run the container and open a terminal (type exit
to close the terminal). To run, stop, and delete existing containers, we use the docker start <container id>
, docker stop <container id>
, and docker rm <container id>
commands, respectively. To see the list of all of our instances, both running and idle, we write docker ps -a
.
When we run an instance, we can add the -p
flag followed by a port for Docker to expose, and the -v
flag followed by a home directory to be mounted, which will enable us to work locally (the home directory is addressed via the /mnt/home
path in the container).