Linear regression and beyond

In this section, we will take a closer look at the main concepts of TensorFlow and TensorBoard and try to do some basic operations to get you started. The model we want to implement simulates linear regression.

In statistics and ML, linear regression is a technique that's frequently used to measure the relationship between variables. This is a quite simple but effective algorithm that can be used in predictive modeling as well.

Linear regression models the relationship between a dependent variable, Linear regression and beyond, an interdependent variable,

Linear regression and beyond

, and a random term, b. This can be seen as follows:

Linear regression and beyond

A typical linear regression problem using TensorFlow has the following workflow, which updates the parameters to minimize the given cost (see in the following figure) function:

Linear regression and beyond

Figure 9: A learning algorithm using linear regression in TensorFlow

Now, let's try to follow the preceding figure and reproduce it for the linear regression by conceptualizing the preceding equation. For this, we're going to write a simple Python program for creating data in a 2D space. Then we will use TensorFlow to look for the line that best fits the data points (as shown in the following figure):

# Import libraries (Numpy, matplotlib)

import numpy as np
import matplotlib.pyplot as plot

# Create 1000 points following a function y=0.1 * x + 0.4z
(i.e. # y = W * x + b) with some normal random distribution:

num_points = 1000
vectors_set = []

# Create a few random data points 
for i in range(num_points):
    W = 0.1 # W
   b = 0.4 # b
    x1 = np.random.normal(0.0, 1.0)#in: mean, standard deviation
    nd = np.random.normal(0.0, 0.05)#in:mean,standard deviation
    y1 = W * x1 + b
   
 # Add some impurity with normal distribution -i.e. nd 
    y1 = y1 + nd

 # Append them and create a combined vector set:
    vectors_set.append([x1, y1])

# Separate the data point across axises:
x_data = [v[0] for v in vectors_set]
y_data = [v[1] for v in vectors_set]

# Plot and show the data points in a 2D space
plot.plot(x_data, y_data, 'ro', label='Original data')
plot.legend()
plot.show()

If your compiler does not complain, you should get the following graph:

Linear regression and beyond

Figure 10: Randomly generated (but original) data

Well, so far we have just created a few data points without an associated model that could be executed through TensorFlow. So, the next step is to create a linear regression model that can obtain the output values y that is estimated from the input data points, that is, x_data. In this context, we have only two associated parameters, W and b.

Now the objective is to create a graph that allows us to find the values for these two parameters based on the input data, x_data, by adjusting them to y_data. So, the target function in our case would be as follows:

Linear regression and beyond

If you recall, we defined W = 0.1 and b = 0.4 while creating the data points in the 2D space. TensorFlow has to optimize these two values so that W tends to 0.1 and b to 0.4.

A standard way to solve such optimization problems is to iterate through each value of the data points and adjust the values of W and b in order to get a more precise answer for each iteration. To see if the values really are improving, we need to define a cost function that measures how good a certain line is.

In our case, the cost function is the mean squared error, which helps us find the average of the errors based on the distance function between the real data points and the estimated ones on each iteration. We start by importing the TensorFlow library:

import tensorflow as tf
W = tf.Variable(tf.zeros([1]))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

In the preceding code segment, we are generating a random point using a different strategy and storing it in the variable W. Now, let's define a loss function, Linear regression and beyond, and this returns a scalar value with the mean of all distances between our data and the model prediction. In terms of TensorFlow convention, the loss function can be expressed as follows:

loss = tf.reduce_mean(tf.square(y - y_data))

The preceding line actually computes mean square error (MSE). Without going into further detail, we can use some widely used optimization algorithms, such as GD. At a minimal level, GD is an algorithm that works on a set of given parameters that we already have.

It starts with an initial set of parameter values and iteratively moves toward a set of values that minimize the function by taking another parameter called the learning rate. This iterative minimization is achieved by taking steps in the negative direction of the gradient function:

optimizer = tf.train.GradientDescentOptimizer(0.6)
train = optimizer.minimize(loss)

Before running this optimization function, we need to initialize all the variables that we have so far. Let's do it using a conventional TensorFlow technique, as follows:

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

Since we have created a TensorFlow session, we are ready for the iterative process that helps us find the optimal values of W and b:

for i in range(6):
  sess.run(train)
  print(i, sess.run(W), sess.run(b), sess.run(loss))

You should observe the following output:

>>>
0 [ 0.18418592] [ 0.47198644] 0.0152888
1 [ 0.08373772] [ 0.38146532] 0.00311204
2 [ 0.10470386] [ 0.39876288] 0.00262051
3 [ 0.10031486] [ 0.39547175] 0.00260051
4 [ 0.10123629] [ 0.39609471] 0.00259969
5 [ 0.1010423] [ 0.39597753] 0.00259966
6 [ 0.10108326] [ 0.3959994] 0.00259966
7 [ 0.10107458] [ 0.39599535] 0.00259966

You can see the algorithm starts with the initial values of W = 0.18418592 and b = 0.47198644, and the loss is pretty high. Then, the algorithm iteratively adjusts the values by minimizing the cost function. In the eighth iteration, all the values tend to our desired values.

Now, what if we could plot them? Let's do it by adding a plotting line under the for loop, as follows:

for i in range(6):
       sess.run(train)
       print(i, sess.run(W), sess.run(b), sess.run(loss))
       plot.plot(x_data, y_data, 'ro', label='Original data')
       plot.plot(x_data, sess.run(W)*x_data + sess.run(b))
       plot.xlabel('X')
       plot.xlim(-2, 2)
       plot.ylim(0.1, 0.6)
       plot.ylabel('Y')
       plot.legend()
       plot.show()

The preceding code block, should produce the following figure (merged together, though):

Linear regression and beyond

Figure 11: Linear regression optimizing the loss function after the sixth iteration

Now let's go up to the 16th iteration:

>>>
0 [ 0.23306453] [ 0.47967502] 0.0259004
1 [ 0.08183448] [ 0.38200468] 0.00311023
2 [ 0.10253634] [ 0.40177572] 0.00254209
3 [ 0.09969243] [ 0.39778906] 0.0025257
4 [ 0.10008509] [ 0.39859086] 0.00252516
5 [ 0.10003048] [ 0.39842987] 0.00252514
6 [ 0.10003816] [ 0.39846218] 0.00252514
7 [ 0.10003706] [ 0.39845571] 0.00252514
8 [ 0.10003722] [ 0.39845699] 0.00252514
9 [ 0.10003719] [ 0.39845672] 0.00252514
10 [ 0.1000372] [ 0.39845678] 0.00252514
11 [ 0.1000372] [ 0.39845678] 0.00252514
12 [ 0.1000372] [ 0.39845678] 0.00252514
13 [ 0.1000372] [ 0.39845678] 0.00252514
14 [ 0.1000372] [ 0.39845678] 0.00252514
15 [ 0.1000372] [ 0.39845678] 0.00252514

Much better, and we're closer to the optimized values, right? Now, what if we improve our visual analytics further through TensorFlow to help visualize what is happening in these graphs. TensorBoard provides a web page for debugging your graph and inspecting the variables, node, edges, and their corresponding connections.

Also, we need to annotate the preceding graphs with the variables, such as the loss function, W, b, y_data, x_data, and so on. Then you need to generate all the summaries by invoking the tf.summary.merge_all() function.

Now, we need to make the following changes to the preceding code. However, it is good practice to group related nodes on the graph using the tf.name_scope() function. Thus, we can use tf.name_scope() to organize things on the TensorBoard graph view, but let's give it a better name:

with tf.name_scope("LinearRegression") as scope:
   W = tf.Variable(tf.zeros([1]))
   b = tf.Variable(tf.zeros([1]))
   y = W * x_data + b

Then, let's annotate the loss function in a similar way, but with a suitable name, such as LossFunction:

with tf.name_scope("LossFunction") as scope:
  loss = tf.reduce_mean(tf.square(y - y_data))

Let's annotate the loss, weights, and bias that are needed for TensorBoard:

loss_summary = tf.summary.scalar("loss", loss)
w_ = tf.summary.histogram("W", W)
b_ = tf.summary.histogram("b", b)

Once you have annotated the graph, it's time to configure the summary by merging them:

merged_op = tf.summary.merge_all()

Before running the training (after the initialization), write the summary using the tf.summary.FileWriter() API as follows:

writer_tensorboard = tf.summary.FileWriter('logs/', tf.get_default_graph())

Then start TensorBoard as follows:

$ tensorboard –logdir=<trace_dir_name>

In our case, it could be something like the following:

$ tensorboard --logdir=/home/root/LR/

Now let's move to http://localhost:6006 and click on the GRAPH tab. You should see the following graph:

Linear regression and beyond

Figure 12: The main graph and auxiliary nodes on TensorBoard

Tip

Note that Ubuntu may ask you to install the python-tk package. You can do it by executing the following command on Ubuntu:

$ sudo apt-get install python-tk
# For Python 3.x, use the following
$ sudo apt-get install python3-tk

Linear regression revisited for a real dataset

In the previous section, we saw an example of linear regression. We saw how to use TensorFlow with a randomly generated dataset, that is, fake data. We have seen that regression is a type of supervised machine learning for predicting continuous (rather than discrete) output.

However, running a linear regression on fake data is just like buying a new car but never driving it. This awesome machinery begs to be used in the real world! Fortunately, many datasets are available online to test your new-found knowledge of regression.

One of them is the Boston housing dataset, which can be downloaded from the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/Housing. It is also available as a preprocessed dataset with scikit-learn.

So, let's get started by importing all the required libraries, including TensorFlow, NumPy, Matplotlib, and scikit-learn:

import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from numpy import genfromtxt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split

Next, we need to prepare the training set consisting of features and labels from the Boston housing dataset. The read_boston_data () method reads from scikit-learn and returns the features and labels separately:

def read_boston_data():
    boston = load_boston()
    features = np.array(boston.data)
    labels = np.array(boston.target)
    return features, labels

Now that we have the features and labels, we need to normalize the features as well, using the normalizer() method. Here is the signature of the method:

def normalizer(dataset):
    mu = np.mean(dataset,axis=0)
    sigma = np.std(dataset,axis=0)
    return(dataset - mu)/sigma

bias_vector() is used to append the bias term (that is all 1s) to the normalized features that we prepared in the preceding step. It corresponds to the b term in the equation of straight line in the previous example:

def bias_vector(features,labels):
    n_training_samples = features.shape[0]
    n_dim = features.shape[1]
    f = np.reshape(np.c_[np.ones(n_training_samples),features],[n_training_samples,n_dim + 1])
    l = np.reshape(labels,[n_training_samples,1])
    return f, l

We will now invoke these methods and split the dataset into training and testing, 75% for training and rest for testing:

features,labels = read_boston_data()
normalized_features = normalizer(features)
data, label = bias_vector(normalized_features,labels)
n_dim = data.shape[1]
# Train-test split
train_x, test_x, train_y, test_y = train_test_split(data,label,test_size = 0.25,random_state = 100)

Now let's use TensorFlow's data structures (such as placeholders, labels, and weights):

learning_rate = 0.01
training_epochs = 100000
log_loss = np.empty(shape=[1],dtype=float)
X = tf.placeholder(tf.float32,[None,n_dim]) #takes any number of rows but n_dim columns
Y = tf.placeholder(tf.float32,[None,1]) # #takes any number of rows but only 1 continuous column
W = tf.Variable(tf.ones([n_dim,1])) # W weight vector

Well done! We have prepared the data structure required to construct the TensorFlow graph. Now it's time to construct the linear regression, which is pretty straightforward:

y_ = tf.matmul(X, W)
cost_op = tf.reduce_mean(tf.square(y_ - Y))
training_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost_op)

In the preceding code segment, the first line multiplies the features matrix by the weights matrix that can be used for prediction. The second line computes the loss, which is the squared error of the regression line. Finally, the third line performs one-step of GD optimization to minimize the square error.

Tip

Which optimizer to use: the main objective of using optimizer is to minimize the evaluated cost; therefore, we must define an optimizer. Using the most common optimizer like SGD, the learning rates must scale with 1/T to get convergence, where T is the number of iteration.

Adam or RMSProp tries to overcome this limitation automatically by adjusting the step size so that the step is on the same scale as the gradients. In addition, in the previous example, we have used Adam optimizer, which performs well in most of the cases.

Nevertheless, if you are training a neural network computing the gradients is mandatory, using the RMSPropOptimizer function, which implements the RMSProp algorithm is a better idea, since it would be the faster way of learning in a mini-batch setting. Researchers also recommend using the Momentum optimizer while training a deep CNN or DNN.

Technically, RMSPropOptimizer is an advanced form of gradient descent that divides the learning rate by an exponentially decaying average of squared gradients. The suggested setting value of the decay parameter is 0.9, while a good default value for the learning rate is 0.001.

For example in TensorFlow, tf.train.RMSPropOptimizer() helps us use this with ease:

optimizer = tf.train.RMSPropOptimizer(0.001, 0.9).minimize(cost_op)

Now, before we start training the model, we need to initialize all the variables using the initialize_all_variables() method, as follows:

init = tf.initialize_all_variables()

Fantastic! Now that we have managed to prepare all the components, we're ready to train the actual train. We start by creating TensorFlow session as follows:

sess = tf.Session()
sess.run(init_op)
for epoch in range(training_epochs):
    sess.run(training_step,feed_dict={X:train_x,Y:train_y})
    log_loss = np.append(log_loss,sess.run(cost_op,feed_dict={X: train_x,Y: train_y}))

Once the training is completed, we are able to make predictions on unseen data. However, it's even more exciting to see a visual representation of the completed training. So, let's plot the cost as a function of the number of iterations using Matplotlib:

plt.plot(range(len(log_loss)),log_loss)
plt.axis([0,training_epochs,0,np.max(log_loss)])
plt.show()

The following is the output of the preceding code:

>>>
Linear regression revisited for a real dataset

Figure 13: Cost as a function of the number of iterations

Make some predictions on the test dataset and calculate the mean squared error:

pred_y = sess.run(y_, feed_dict={X: test_x})
mse = tf.reduce_mean(tf.square(pred_y - test_y))
print("MSE: %.4f" % sess.run(mse))

The following is the output of the preceding code:

>>>
MSE: 27.3749

Finally, let's show the line of best fit:

fig, ax = plt.subplots()
ax.scatter(test_y, pred_y)
ax.plot([test_y.min(), test_y.max()], [test_y.min(), test_y.max()], 'k--', lw=3)
ax.set_xlabel('Measured')
ax.set_ylabel('Predicted')
plt.show()

The following is the output of the preceding code:

>>>
Linear regression revisited for a real dataset

Figure 14: Predicted versus actual values

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset