Regression metrics

true_pred_reg is an RDD of tuples where the first element is the prediction from our linear regression model and the second element is the expected value (the number of hours worked per week). Here's how we create it:

true_pred_reg = (
    final_data_hours_test
    .map(lambda row: (
         float(workhours_model_lm.predict(row.features))
         , row.label))
)

The metrics_lm object contains a variety of metrics: explainedVariance, meanAbsouteError, meanSquaredError, r2, and rootMeanSquaredError. Here, we will only print out a couple of them:

print('R^2: ', metrics_lm.r2)
print('Explained Variance: ', metrics_lm.explainedVariance)
print('meanAbsoluteError: ', metrics_lm.meanAbsoluteError)

Let's see what we got for the linear regression model:

Not unexpectedly, the model performs really poorly, given what we have already seen. Do not be too surprised by the negative R-squared; it can turn negative, that is, a nonsensical value for R-squared, if the predictions of the model are nonsensical.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset