How it works...

First, as always, we collate all the features we want to use in our model using the .VectorAssembler(...) method. Note that we only use the columns starting from the second one as the first one is our target—the elevation feature. 

Next, we specify the .RandomForestRegressor(...) object. The object uses an almost-identical list of parameters as .RandomForestClassifier(...).

See the previous recipe for a list of other notable parameters.

The last step is to build the Pipeline object; pip has only two stages: vectorAssembler and rf_obj.

Next, let's see how our model is performing compared to the linear regression model we estimated in the Introducing Estimators recipe:

results = (
pip
.fit(forest)
.transform(forest)
.select('Elevation', 'prediction')
)

evaluator = ev.RegressionEvaluator(labelCol='Elevation')
evaluator.evaluate(results, {evaluator.metricName: 'r2'})

.RegressionEvaluator(...) calculates the performance metrics of regression models. By default, it returns rmse, the root mean-squared error, but it can also return:

  • mse: This is the mean-squared error
  • r2: This is the R2 metric
  • mae: This is the mean-absolute error

From the preceding code, we got:

This is better than the linear regression model we built earlier, meaning that our model might not be as linearly separable as we initially thought.

Check out this website for more information about the different types of regression metrics: http://bit.ly/2sgpONr.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset