The following code provides a streamlined version of the execution of the linear regression model estimation via GLM:
from pyspark.ml import Pipeline
vectorAssembler = feat.VectorAssembler(
inputCols=forest.columns[1:]
, outputCol='features')
lr_obj = rg.GeneralizedLinearRegression(
labelCol='Elevation'
, maxIter=10
, regParam=0.01
, link='identity'
, linkPredictionCol="p"
)
pip = Pipeline(stages=[vectorAssembler, lr_obj])
(
pip
.fit(forest)
.transform(forest)
.select('Elevation', 'prediction')
.show(5)
)