ReLU classifier

The last architectural change improved the accuracy of our model, but we can do even better by changing the sigmoid activation function with the Rectified Linear Unit, shown as follows:

ReLU function

A Rectified Linear Unit (ReLU) unit computes the function f(x) = max(0, x), ReLU is computationally fast because it does not require any exponential computation, such as those required in sigmoid or tanh activations, furthermore it was found to greatly accelerate the convergence of stochastic gradient descent compared to the sigmoid/tanh functions.

To use the ReLU function, we simply change, in the previously implemented model, the following definitions of the first four layers, in the previously implemented model.

First layer output:

Y1 = tf.nn.relu(tf.matmul(XX, W1) + B1)

Second layer output:

Y2 = tf.nn.relu(tf.matmul(Y1, W2) + B2)

Third layer output:

Y3 = tf.nn.relu(tf.matmul(Y2, W3) + B3)

Fourth layer output:

Y4 = tf.nn.relu(tf.matmul(Y3, W4) + B4)

Of course tf.nn.relu is TensorFlow's implementation of ReLU.

The accuracy of the model is almost 98%, as you could see running the network:

>>>  
Loading data/train-images-idx3-ubyte.mnist 
Loading data/train-labels-idx1-ubyte.mnist 
Loading data/t10k-images-idx3-ubyte.mnist 
Loading data/t10k-labels-idx1-ubyte.mnist 
Epoch:  0 
Epoch:  1 
Epoch:  2 
Epoch:  3 
Epoch:  4 
Epoch:  5 
Epoch:  6 
Epoch:  7 
Epoch:  8 
Epoch:  9 
Accuracy:  0.9789 
done 
>>>

Table of Contents for ReLU classifier

Create new playlist

Sign In

Sign Up

Table of Contents for
ReLU classifier