Using URL parameters to filter the results

In this section, I will retrieve all of the issue reports from the Seeclickfix API that occurred on the first day of January 2017. To start with, I will create a new file called get_scf_date_range.py, and import the requests module and the csv as follows:

import requests
import csv

The goal will be to gather all of the issue reports that occur during the first day of January 2017 and to store the results in a CSV file. In order to do this, you will need to make use of URL parameters. URL parameters are custom values that are added on to the end of a URL to further specify the get request.

The Seeclickfix API documentation on the issues resource shows a number of URL parameters that can be used. After looking through the Seeclickfix API documentation at http://dev.seeclickfix.com/v2/issues/, you may identify three particular parameters that are of use. The first two of these are the after and before parameters: 

These parameters can be used to specify that the results should be after January 01, 2017 and before January 02, 2017. URL parameters are specified after a base URL using the following syntax:

<url>?<parameter1>=<value1>&<parameter2>=<value2>

In the following continuation of get_scf_date_range.py, the URL parameters are added to the URL string to restrict the results so that they only contain data for the first date of January 2017:

import requests
import csv

url = "https://seeclickfix.com/api/v2/issues?"

## restrict the result to issues after January 2017
url+="after=2017-01-01T00:00:00"

## restrict the result to issues before February 2017
url+="&before=2017-01-02T00:00:00"

The third parameter that will be of use is the page parameter:

The page parameter allows you to get data that is beyond the initial set of results, in the following continuation of get_scf_date_range.py, I've added the page parameter to the URL string, but left it blank:

....
url+="&before=2017-01-02T00:00:00"

## leave the page parameter empty so that it
## can be dynamically changed
url+="&page="

Rather than specifying just one page for the page parameter, the strategy I will use is to make a number of repetitive get requests, each containing a subsequent page number. This way, it is possible to get multiple pages of the data instead of just one. 

However, before conducting the get requests, I will do some initial setup, creating the output file and specifying data variables to be extracted. In the following continuation of get_scf_date_range.py, an array of column headers is created. These column headers will also be used as keys to extract the desired data variables from the original data retrieved from the API. A new output file is then opened and used to create a writer object. The column headers are then written to the first row of the output file:

....
url+="&page="

## create a list of field names that should be extracted
fields=["created_at","closed_at","summary","address"]

## open the output file and create a writer
## write the column headers to the output file
fout = open("output_data/scf_date_range_issues.csv","w")
writer=csv.writer(fout)
writer.writerow(fields)

fout.close()

The next step is to perform the get requests, page by page. To set up this process, in the following continuation of get_scf_date_range.py, I will first create a variable called page to contain the page number, and a variable called data containing the extracted data from the http response:

....
writer.writerow(fields)

## initialize the page and data variables
page=1
data=requests.get(base_url+str(page)).json()["issues"]

fout.close()
The next bit of code will require a Python tool that hasn't been covered yet in this book, the while loop. A while loop works like a for loop, except it runs continuously until a certain condition is met. The condition is specified in the clause header of the while loop.
You can read about while loops in the Python documentation at the following link:

https://docs.python.org/3/reference/compound_stmts.html#while

Next, in the following continuation of get_scf_date_range.py, a while loop is created to continuously increase the page number and submit get requests until there is no more data left. If there is data, each data entry is converted to an array and written to the output CSV file:

....
data=requests.get(base_url+str(page)).json()["issues"]

## go page by page until there is no data
while len(data)>0:
## if there is data, iterate over the
## data entries, writing the result to the output
for entry in data:
row=[]
for field in fields:
row.append(entry[field])
writer.writerow(row)

## in each iteration of the loop,
## increase the page number and get
## the data for the subsequent page
page+=1
data=requests.get(base_url+str(page)).json()["issues"]


fout.close()

Now, running get_scf_date_range.py will produce a CSV dataset containing information on all of the issue reports in the first day of 2017:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset