Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Loading datasets

Spark SQL can read data from external storage systems such as files, Hive tables, and JDBC databases through the DataFrameReader interface.

The format of the API call is spark.read.inputtype

Parquet
CSV
Hive Table
JDBC
ORC
Text
JSON

Let's look at a couple of simple examples of reading CSV files into DataFrames:

scala> val statesPopulationDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesPopulation.csv")
statesPopulationDF: org.apache.spark.sql.DataFrame = [State: string, Year: int ... 1 more field]

scala> val statesTaxRatesDF = spark.read.option("header", "true").option("inferschema", "true").option("sep", ",").csv("statesTaxRates.csv")
statesTaxRatesDF: org.apache.spark.sql.DataFrame = [State: string, TaxRate: double]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Loading datasets

Create new playlist

Sign In

Sign Up

Table of Contents for
Loading datasets