Home Page Icon
Home Page
Table of Contents for
Learning Spark
Close
Learning Spark
by Tathagata Das, Brooke Wenig, Denny Lee, Jules Damji
Learning Spark, 2nd Edition
1. Introduction to Apache Spark: A Unified Analytics Engine
The Genesis of Big Data and Distributed Computing at Google
Hadoop at Yahoo!
Spark’s Early Years at AMPLab
What is Apache Spark?
Unified Analytics
Apache Spark Components as a Unified Stack
Apache Spark’s Distributed Execution and Concepts
Developer’s Experience
Who Uses Spark, and for What?
Data Science Tasks
Data Engineering Tasks
Machine Learning or Deep Learning Tasks
Community Adoption and Expansion
2. Downloading Apache Spark and Getting Started
Spark’s Directories and Files
Step 2: Use Scala Shell or PySpark Shell
Using Local Machine
Step 3: Understand Spark Application Concepts
Spark Application and SparkSession
Spark Jobs
Spark Stages
Spark Tasks
Transformations, Actions, and Lazy Evaluation
Spark UI
Databricks Community Edition
First Standalone Application
Using Local Machine
Counting M&Ms for the Cookie Monster
Building Standalone Applications in Scala
Summary
3. Apache Spark’s Structured APIs
Spark: What’s Underneath an RDD?
Structuring Spark
Key Merits and Benefits
Structured APIs: DataFrames and Datasets APIs
DataFrame API
Common DataFrame Operations
Datasets API
DataFrames vs Datasets
What about RDDs?
Spark SQL and the Underlying Engine
Catalyst Optimizer
Summary
4. Spark SQL and DataFrames — Introduction to Built-in Data Sources
Using Spark SQL in Spark Applications
Basic Query Example
SQL Tables and Views
Data Sources for DataFrames and SQL Tables
DataFrameReader
DataFrameWriter
Parquet
JSON
CSV
Avro
ORC
Image
Summary
5. Spark SQL and Datasets
Single API for Java and Scala
Scala Case Classes and JavaBeans for Datasets
Working with Datasets
Creating Sample Data
Transforming Sample Data
Memory Management for Datasets and DataFrames
Dataset Encoders
Spark’s Internal Format vs Java Object Format
Serialization and Deserialization (SerDe)
Costs of Using Datasets
Strategies to Mitigate Costs
Summary
6. Loading and Saving Your Data
Motivation for Data Sources
File Formats: Revisited
Text Files
Organizing Data for Efficient I/O
Partitioning
Bucketing
Compression Schemes
Saving as Parquet Files
Delta Lake Storage Format
Delta Lake Table
Summary
7. Structured Streaming
Evolution of Apache Spark Stream Processing Engine
The Advent of Micro-batch Stream Processing
Lessons Learnt from Spark Streaming (DStreams)
The Philosophy of Structured Streaming
The Programming Model of Structured Streaming
The Fundamentals of a Structured Streaming Query
Five Steps to Define a Streaming Query
Under the Hood of an Active Streaming Query
Recovering from Failures with Exactly-once Guarantees
Monitoring an Active Query
Streaming Data Sources and Sinks
Files
Apache Kafka
Custom Streaming Sources and Sinks
Data Transformations
Incremental Execution and Streaming State
Stateless Transformations
Stateful Transformations
Stateful Streaming Aggregations
Non-time-based Streaming Aggregations
Aggregations with event time windows
Handling late data with watermarks
Supported Output Modes
Streaming Joins
Stream-static Joins
Stream-stream Joins
Arbitrary stateful computations
Modeling Arbitrary Stateful Operation with mapGroupsWithState
Using timeouts to manage inactive groups
Generalization with flatMapGroupsWithState
Performance Tuning
Summary
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Learning Spark
Learning Spark
Second Edition
Jules Damji, Denny Lee, Brooke Wenig,
and Tathagata Das
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset