We will read the data into a DataFrame so it is easier for us to work with. Later on, we will convert it into an RDD of labeled points. To read the data, execute the following:
census_path = '../data/census_income.csv' census = spark.read.csv( census_path , header=True , inferSchema=True )