Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Summary

In this chapter, we discussed Spark SQL as a one-stop solution for processing large data using a mix of SQL-like queries and complex procedural algorithms in-memory, producing results in seconds/minutes but not hours.

We started with the various aspects of Spark SQL including its architecture and various components. We also talked about the complete process of writing Spark SQL jobs in Scala and at the same time, we also talked about various methodologies for converting Spark RDDs into DataFrames. Toward the middle of the chapter, we executed various examples of Spark SQL using different data formats such as Hive/Parquet along with important aspects such as schema evolution and schema merging. Finally at the end, we discussed the various aspects of performance tuning our Spark SQL code/queries.

In the next chapter, we will discuss capturing, processing, and analyzing streaming data using Spark Streaming.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Summary

Table of Contents for
Summary