Regression performance

To measure the performance of a regression, the distance between the predicted outputs and the actual outputs, is a good model performance measure.

Rattle offers us a good way to see predicted values versus the actual value—the Predicted versus Observed plot. To test this plot, you need to create a regression model. You can download a sample dataset from the UCI Machine Learning Repository (http://archive.ics.uci.edu/ml; Irvine, CA: University of California, School of Information and Computer Science), or from Kaggle (http://www.kaggle.com/). On some websites, such as the UCI Machine Learning Repository, the datasets are classified by the task you want to perform with the dataset.

Predicted versus Observed Plot

Imagine we have to create a model to predict the price of a house. Click on the Evaluate tab:

Predicted versus Observed Plot

Rattle's Evaluate tab offers us two good options for a regression model as shown in the preceding screenshot:

  • Predicted versus Observed Plot: We will use this option to compare predicted values versus actual values.
  • Score: As we've seen before, this option creates predictions for the selected dataset:

After creating the model, go to the Evaluate tab, select your Model, the Validation dataset, the Pr v Ob option, press Execute, and Rattle will build a Predicted vs. Observed plot for you, as shown here:

Predicted versus Observed Plot

This plot shows a set of points; each point is an observation in the y axes, where we can see the predicted value, and in the x axes, we can see the actual value. We can also see a dotted line; this line represents a perfect prediction, when predicted values are the same as the actual values. The last line is a linear fit to points.

Finally, Pseudo R-square is an approach to R-square. This measures the variance explained by the model. R-square is a number from 0 to 1; an R-square close to 1 means that the model has strong predictive power. When the model doesn't provide a good prediction, R-square is close to 0. In the same way, a Pseudo R-square close to 1 is good; a measure close to 0 means low performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset