How it works...

The measurement of the performance of the regression model employs the distance between the predicted value and the actual value. We often use these three measurements, root mean square error, relative square error, and R-Square, as the quantifier for the performance of regression models. In this recipe, we first load the Quartet data from the car package. We then use the lm function to fit the linear model, and add the regression line on a scatter plot of the x variable against the y3 variable. Next, we compute the predicted value using the predict function, and begin to compute the Root Mean Square Error (RMSE), Relative Square Error (RSE), and R-Square for the built model.

As this dataset has an outlier at x=13, we would like to quantify how the outlier affects the performance measurement. To achieve this, we first train a regression model using the rlm function from the MASS package. Similar to the previous step, we then generate a performance measurement of the root square mean error, relative square error, and R-Square. From the output measurement, it is obvious that the mean square error and the relative square errors of the lm model are smaller than the model built by rlm, and the score of R-Square shows that the model built with lm has a greater prediction power. However, for the actual scenario, we should remove the outlier at x=13. This comparison shows that the outlier may be biased toward the performance measure and may lead us to choose the wrong model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset