In this chapter, we learned how to load data on Spark RDDs and also covered parallelization with Spark RDDs. We had a brief overview of the UCI machine learning repository before loading the data. We had an overview of the basic RDD operations, and also checked the functions from the official documentation.
In the next chapter, we will cover big data cleaning and data wrangling.