Understanding linear regression

The easiest regression model is called linear regression. The idea behind linear regression is to describe a target variable (such as Boston house pricing—recall the various datasets we studied in Chapter 1, A Taste of Machine Learning) with a linear combination of features.

To keep things simple, let's just focus on two features. Let's say we want to predict tomorrow's stock prices using two features: today's stock price and yesterday's stock price. We will denote today's stock price as the first feature, f1, and yesterday's stock price as f2. Then, the goal of linear regression would be to learn two weight coefficients, w1 and w2, so that we can predict tomorrow's stock price as follows:

Here,  is the prediction of tomorrow's ground truth stock price .

The special case where we have only one feature variable is called simple linear regression.

We could easily extend this to feature more stock price samples from the past. If we had M feature values instead of two, we would extend the preceding equation to a sum of M products, so that every feature is accompanied by a weight coefficient. We can write the resulting equation as follows:

Let's think about this equation geometrically for a second. In the case of a single feature, f1, the equation for ŷ would become ŷ = w1 f1, which is essentially a straight line. In the case of two features,  would describe a plane in the feature space, as illustrated in the following screenshot:

In N dimensions, this would become what is known as a hyperplane. If a space is N-dimensional, then its hyperplanes have N-1 dimensions.

As is evident in the preceding screenshot, all of these lines and planes intersect at the origin. But what if the true y values we are trying to approximate don't go through the origin?

To be able to offset ŷ from the origin, it is customary to add another weight coefficient that does not depend on any feature values and hence acts like a bias term. In a 1D case, this term acts as the ŷ-intercept. In practice, this is often achieved by setting f0=1 so that w0 can act as the bias term:

Here, we can keep  .

Finally, the goal of linear regression is to learn a set of weight coefficients that lead to a prediction that approximates the ground truth values as accurately as possible. Rather than explicitly capturing a model's accuracy like we did with classifiers, scoring functions in regression often take the form of so-called cost functions (or loss functions).

As discussed earlier in this chapter, there are several scoring functions we can use to measure the performance of a regression model. The most commonly used cost function is probably the mean squared error, which calculates an error (yi - ŷi)2 for every data point i by comparing the prediction ŷi to the target output value yi and then taking the average:

Here, N is the number of data points.

Regression hence becomes an optimization problem—and the task is to find the set of weights that minimizes the cost function.

This is usually done with an iterative algorithm that is applied to one data point after the other, hence reducing the cost function step by step. We will talk more deeply about such algorithms in Chapter 9, Using Deep Learning to Classify Handwritten Digits.

But enough of all of this theory—let's do some coding!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset