Let's begin with a code that will help to select the top 10 features with the most predictive power to find the best class for an observation in our forest DataFrame:
vectorAssembler = feat.VectorAssembler(
inputCols=forest.columns[0:-1]
, outputCol='features'
)
selector = feat.ChiSqSelector(
labelCol='CoverType'
, numTopFeatures=10
, outputCol='selected')
pipeline_sel = Pipeline(stages=[vectorAssembler, selector])