Reading data from the database

Before we dive into the nitty-gritty of visualization, let's get our data. Here, we will use the database connection to the SQLite file created. First, we'll create a connection to the file:

import sqlite
con = sqlite3.connect('../Chapter16/data/311.db')

Next, we will define a simple query to aggregate raw records into statistics:

Q = '''
SELECT date(created_date) as date, lower(borough) as boro, complaint_type, COUNT(*) as complaints
FROM raw WHERE borough != 'Unspecified' GROUP BY 1,2,3;
'''

Finally,  we will pull the data using the pandas SQL command. As we're dealing with SQLite, we'll have to re-parse date-times in Python:

DATA = pd.read_sql_query(Q, con)
DATA['date'] = pd.to_datetime(DATA['date'])
Alternatively, we could create VIEW with that query in the same file. That would allow us to pull data directly for this and other tasks. 

For the timeline part of the chart, we could further aggregate our dataset:

>>> boro_total = DATA.groupby(['date', 'boro'])['complaints'].sum().unstack()

>>> boro_total.head(5)
boro bronx brooklyn manhattan queens staten island
date
2019-01-01 995 1657 859 1237 249
2019-01-02 1675 2444 1307 1880 649
2019-01-03 1450 2532 1420 1799 484
2019-01-04 1472 2407 1417 1835 425
2019-01-05 1085 1551 954 1250 292

Now, our data is ready to be visualized! Let's get to it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset