Reading data from the database

Before we dive into the nitty-gritty of visualization, let's get our data. Here, we will use the database connection to the SQLite file created. First, we'll create a connection to the file:

import sqlite
con = sqlite3.connect('../Chapter16/data/311.db')

Next, we will define a simple query to aggregate raw records into statistics:

Q = '''
SELECT date(created_date) as date, lower(borough) as boro, complaint_type, COUNT(*) as complaints
FROM raw WHERE borough != 'Unspecified' GROUP BY 1,2,3;
'''

Finally, we will pull the data using the pandas SQL command. As we're dealing with SQLite, we'll have to re-parse date-times in Python:

DATA = pd.read_sql_query(Q, con)
DATA['date'] = pd.to_datetime(DATA['date'])

Alternatively, we could create VIEW with that query in the same file. That would allow us to pull data directly for this and other tasks.

For the timeline part of the chart, we could further aggregate our dataset:

>>> boro_total = DATA.groupby(['date', 'boro'])['complaints'].sum().unstack()

>>> boro_total.head(5)
      boro bronx brooklyn manhattan queens staten island
date 
2019-01-01 995   1657     859       1237   249
2019-01-02 1675  2444     1307      1880   649
2019-01-03 1450  2532     1420      1799   484
2019-01-04 1472  2407     1417      1835   425
2019-01-05 1085  1551     954       1250   292

Now, our data is ready to be visualized! Let's get to it.

Table of Contents for Reading data from the database

Create new playlist

Sign In

Sign Up

Table of Contents for
Reading data from the database