Just like with classification or regression models, building clustering models is pretty straightforward in Spark. Here's the code that aims to find patterns in the census data:
import pyspark.mllib.clustering as clu model = clu.KMeans.train(
final_data.map(lambda row: row[1]) , 2 , initializationMode='random' , seed=666 )