Saving Data in the Correct Format

In the previous chapters, we were focusing on processing and loading data. We learned about transformations, actions, joining, shuffling, and other aspects of Spark.

In this chapter, we will learn how to save data in the correct format and also save data in plain text format using Spark's standard API. We will also leverage JSON as a data format, and learn how to use standard APIs to save JSON. Spark has a CSV format and we will leverage that format as well. We will then learn more advanced schema-based formats, where support is required to import third-party dependencies. Following that, we will use Avro with Spark and learn how to use and save the data in a columnar format known as Parquet. By the end of this chapter, we will have also learned how to retrieve data to validate whether it is stored in the proper way.

In this chapter, we will cover the following topics:

  • Saving data in plain text format 
  • Leveraging JSON as a data format
  • Tabular formats – CSV
  • Using Avro with Spark
  • Columnar formats – Parquet

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset