How it works...

First, as always, we collate all the features we want to use in our model using the .VectorAssembler(...) method. Note that we only use the columns starting from the second one as the first one is our target—the elevation feature.

Next, we specify the .RandomForestRegressor(...) object. The object uses an almost-identical list of parameters as .RandomForestClassifier(...).

See the previous recipe for a list of other notable parameters.

The last step is to build the Pipeline object; pip has only two stages: vectorAssembler and rf_obj.

Next, let's see how our model is performing compared to the linear regression model we estimated in the Introducing Estimators recipe:

results = (
    pip
    .fit(forest)
    .transform(forest)
    .select('Elevation', 'prediction')
)

evaluator = ev.RegressionEvaluator(labelCol='Elevation')
evaluator.evaluate(results, {evaluator.metricName: 'r2'})

.RegressionEvaluator(...) calculates the performance metrics of regression models. By default, it returns rmse, the root mean-squared error, but it can also return:

mse: This is the mean-squared error
r2: This is the R² metric
mae: This is the mean-absolute error

From the preceding code, we got:

This is better than the linear regression model we built earlier, meaning that our model might not be as linearly separable as we initially thought.

Check out this website for more information about the different types of regression metrics: http://bit.ly/2sgpONr.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...