Linear regression

As we mentioned previously, regression focuses on using the relationship between two variables for prediction. In order to make predictions using linear regression, the best-fitting straight line must be computed.

If all the points (values for the variables) lie on a straight line, then the relationship is deemed perfect. This rarely happens in practice and the points do not all fit neatly on a straight line. Because of this, the relationship is imperfect. In some cases, a linear relationship only occurs among log-transformed variables. This is a log-log model. An example of such a relationship would be a power law distribution in physics, where one variable varies as a power of another.

Thus, an expression such as this results in the linear relationship.

For more information, refer to http://en.wikipedia.org/wiki/Power_law.

To construct the best-fit line, the method of least squares is used. In this method, the best-fit line is the optimal line that is constructed between the points for which the sum of the squared distance from each point to the line is the minimum. This is deemed to be the best linear approximation of the relationship between the variables we are trying to model using linear regression. The best-fit line in this case is called the least squares regression line.

More formally, the least squares regression line is the line that has the minimum possible value for the sum of squares of the vertical distance from the data points to the line. These vertical distances are also known as residuals.

Thus, by constructing the least squares regression line, we're trying to minimize the following expression:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset