Using the CSV module to write CSV data

In this next demonstration, you will read the output data from the previous chapter and convert it to CSV format.

To start off with, I will create a file called json_to_csv.py which will make use of both the csv module and the json module. In the json_to_csv.py file, I will start by importing both the csv module and the json module and reading the JSON data from the scf_extract.json file into a Python list:

import csv
import json

## read in the input json data
fin = open("../data/input_data/scf_extract.json","r")
json_data = json.load(fin)
fin.close()

Writing CSV data using the csv module is a bit like reading data with the CSV module in reverse. When opening a file with write permission, you can use the corresponding file object to create a writer object. This is done using the csv.writer() function. The writerow() function of the writer object will take an array of values and write it to a row in the output file. 

Recall that the extracted data entries from the previous chapter took the form of an array of dictionaries. To demonstrate this visually, the following is the structure of the data:

[
{<key1>:<value1>,<key2>,<value2>,...},
{<key1>:<value1>,<key2>,<value2>,...}
{<key1>:<value1>,<key2>,<value2>,...}
...
]

Converting scf_extract.py to CSV format will involve changing each data entry into a Python list, where the values are in a fixed order. Python dictionaries and JSON key value pairs are not necessarily in a fixed order. In order to keep the values in a fixed order that corresponds to the column headers, the approach used here will be to create an array of the column headers. For each data entry, the array of column headers can be used to extract each value in order and place it in an array of values.

First, in the following continuation of csv_to_json.py, the keys of the first data entry (which correspond to the data variables in the dataset) are placed into a Python list. This Python list will be used as the reference of the column headers of the output file:

....
json_data = json.load(fin)
fin.close()

## get an array of data variables to
## use as column headers
keys = json_data[0].keys()

Next, an output file is opened with write permission and the corresponding file object is used to create a writer object. The writer.writerow() function is used to write the first row (containing the column headers) to the output file:

....
keys = json_data[0].keys()

## open an output file with write permission
## and create a writer object
fout = open("../data/output_data/scf_extract.csv","w")
writer = csv.writer(fout)
writer.writerow(keys)

fout.close()

The last step will be to iterate over each of the entries in the json_data variable. Each data entry from the original dataset should be converted from dictionary to a Python list, and written to the output CSV file.

In the following continuation of json_to_csv.py, a for loop is created to iterate over the entries in the JSON dataset. In a nested for loop, each of the data variables are extracted one by one and placed in a fixed order into a Python list. The resulting Python list is then written to a row in the output CSV file:

....
writer.writerow(keys)

## iterate over the json data extracting
## the data variables into an ordered list
## write each data entry to the output file
for entry in json_data:
row=[]
for key in keys:
row.append(entry[key])
writer.writerow(row)

fout.close()

That's it for the CSV module! If you are interested in learning more, I've made a link to the documentation for the csv module available in the Links and Further Reading document of the external resources. 

The next section will be an introduction to the pandas module which uses a different approach for processing tabular data. In the next section, I will revisit the task of counting the total road length as of 2011 in order to demonstrate some of the differences. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset