How to do it...

Let's begin with a code that will help to select the top 10 features with the most predictive power to find the best class for an observation in our forest DataFrame:

vectorAssembler = feat.VectorAssembler(
inputCols=forest.columns[0:-1]
, outputCol='features'
)

selector = feat.ChiSqSelector(
labelCol='CoverType'
, numTopFeatures=10
, outputCol='selected')

pipeline_sel = Pipeline(stages=[vectorAssembler, selector])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset