Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

How it works...

First, we specify the path to our dataset. In our case, as with all the other datasets we use in this book, census_income.csv is located in the data folder, accessible from the parent folder.

Next, we use the .read property of SparkSession, which returns the DataFrameReader object. The first parameter to the .csv(...) method specifies the path to the data. Our dataset has the column names in the first row, so we use the header option to instruct the reader to use the first row for column names. The inferSchema parameter instructs the DataFrameReader to automatically detect the datatype of each column.

Let's check whether the datatype inference is correct:

census.printSchema()

The preceding code produces the following output:

As you can see, the datatype of certain columns was detected properly; without the inferSchema parameter, all the columns would default to strings.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...