How to do it...

We will read the data into a DataFrame so it is easier for us to work with. Later on, we will convert it into an RDD of labeled points. To read the data, execute the following:

census_path = '../data/census_income.csv'

census = spark.read.csv(
    census_path
    , header=True
    , inferSchema=True
)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset