Saving Data in the Correct Format

In the previous chapters, we were focusing on processing and loading data. We learned about transformations, actions, joining, shuffling, and other aspects of Spark.

In this chapter, we will learn how to save data in the correct format and also save data in plain text format using Spark's standard API. We will also leverage JSON as a data format, and learn how to use standard APIs to save JSON. Spark has a CSV format and we will leverage that format as well. We will then learn more advanced schema-based formats, where support is required to import third-party dependencies. Following that, we will use Avro with Spark and learn how to use and save the data in a columnar format known as Parquet. By the end of this chapter, we will have also learned how to retrieve data to validate whether it is stored in the proper way.

In this chapter, we will cover the following topics:

Saving data in plain text format
Leveraging JSON as a data format
Tabular formats – CSV
Using Avro with Spark
Columnar formats – Parquet

Table of Contents for Saving Data in the Correct Format

Create new playlist

Sign In

Sign Up

Table of Contents for
Saving Data in the Correct Format