How to do it...

The process of building a clustering model in Spark does not deviate significantly from what we have already seen in either the classification or regression examples:

import pyspark.ml.clustering as clust

vectorAssembler = feat.VectorAssembler(
inputCols=forest.columns[:-1]
, outputCol='features')

kmeans_obj = clust.KMeans(k=7, seed=666)

pip = Pipeline(stages=[vectorAssembler, kmeans_obj])
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset