In this chapter, we will look at the immutable design of Apache Spark. We will delve into the Spark RDD's parent/child chain and use RDD in an immutable way. We will then use DataFrame operations for transformations to discuss immutability in a highly concurrent environment. By the end of this chapter, we will use the Dataset API in an immutable way.
In this chapter, we will cover the following topics:
- Delving into the Spark RDD's parent/child chain
- Using RDD in an immutable way
- Using DataFrame operations to transform
- Immutability in the highly concurrent environment
- Using the Dataset API in an immutable way