Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

18. Visualizing Real-Life Data with Matplotlib and Seaborn

Ashwin Pajankar¹

(1)

Nashik, Maharashtra, India

In the previous chapter, you learned how to visualize data with a new data visualization library for scientific Python tasks. You learned to create visualizations from data stored in various formats.

In this chapter, you will take all the knowledge you have obtained in the earlier chapters of this book and put it together to prepare visualizations for real-life data from the COVID-19 pandemic and animal disease datasets obtained from the Internet. The following are the topics you will explore in this chapter:

COVID-19 pandemic data
Fetching the pandemic data programmatically
Preparing the data for visualization
Creating visualizations with Matplotlib and Seaborn
Creating visualizations of animal disease data

After reading this chapter, you will be comfortable working with and creating visualizations of real-life datasets.

COVID-19 Pandemic Data

The world is facing the COVID-19 pandemic as of this writing (May 2021). COVID-19 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The symptoms include common flu-like symptoms and breathing troubles.

There are multiple organizations in the world that collect and share real-time data for pandemics. One is Johns Hopkins University (https://coronavirus.jhu.edu/map.html), and the other one is Worldometers (https://www.worldometers.info/coronavirus/). Both of these web pages have data about the COVID-19 pandemic, and they are refreshed quite frequently. Figure 18-1 shows the Johns Hopkins page for COVID-19.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig1_HTML.jpg — Figure 18-1
Johns Hopkins COVID-19 home page

Figure 18-2 shows the Worldometers website.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig2_HTML.jpg — Figure 18-2
Worldometers COVID-19 home page

As I mentioned, the data is refreshed on a frequent basis, so these websites are quite reliable for up-to-date information.

Fetching the Pandemic Data Programmatically

In this section, you will learn how to fetch both datasets (Johns Hopkins and Worldometers) using Python programs. To do that, you need to install a library for Python. The library’s home page is located at https://ahmednafies.github.io/covid/, and the PyPI page is https://pypi.org/project/covid/. Create a new notebook for this chapter using Jupyter Notebook. You can easily install the library with the following command in the notebook:

!pip3 install covid

You can import the library to a notebook or a Python script/program as follows:

from covid import Covid

You can create an object to fetch the data from an online source. By default, the data source is as follows for Johns Hopkins:

covid = Covid()

Note that due to high traffic, sometimes the servers are unresponsive. I experienced this multiple times.

You can explicitly mention the data source as follows:

covid = Covid(source="john_hopkins")

You can specify Worldometers explicitly as follows:

covid = Covid(source="worldometers")

You can see the source of the data as follows:

covid.source

Based on the data source, this returns a relevant string, as shown here:

'john_hopkins'

You can get status by country name as follows:

covid.get_status_by_country_name("italy")

This returns a dictionary, as follows:

{'id': '86',

'country': 'Italy',

'confirmed': 4188190,

'active': 283744,

'deaths': 125153,

'recovered': 3779293,

'latitude': 41.8719,

'longitude': 12.5674,

'last_update': 1621758045000}

You can also fetch the status by country ID, although only the Johns Hopkins dataset has this column, so the code will return an error for Worldometers.

# Only valid for Johns Hopkins

covid.get_status_by_country_id(115)

The output is similar to the earlier example, as shown here:

{'id': '115',

'country': 'Mexico',

'confirmed': 2395330,

'active': 261043,

'deaths': 221597,

'recovered': 1912690,

'latitude': 23.6345,

'longitude': -102.5528,

'last_update': 1621758045000}

You can also fetch the list of countries as follows:

covid.list_countries()

Here is part of the output:

[{'id': '179', 'name': 'US'},

{'id': '80', 'name': 'India'},

{'id': '24', 'name': 'Brazil'},

{'id': '63', 'name': 'France'},

{'id': '178', 'name': 'Turkey'},

{'id': '143', 'name': 'Russia'},

{'id': '183', 'name': 'United Kingdom'},

....

You will continue using the Johns Hopkins dataset throughout the chapter.

You can get active cases as follows:

covid.get_total_active_cases()

The output is as follows:

27292520

You can get the total confirmed cases as follows:

covid.get_total_confirmed_cases()

The output is as follows:

166723247

You can get the total recovered cases as follows:

covid.get_total_recovered()

The output is as follows:

103133392

You can get total deaths as follows:

covid.get_total_deaths()

The output is as follows:

3454602

You can fetch all the data with the function call covid.get_data(). This returns a list of dictionaries where every dictionary holds the data of one country. The following is the output:

[{'id': '179',

'country': 'US',

'confirmed': 33104963,

'active': None,

'deaths': 589703,

'recovered': None,

'latitude': 40.0,

'longitude': -100.0,

'last_update': 1621758045000},

{'id': '80',

'country': 'India',

'confirmed': 26530132,

'active': 2805399,

'deaths': 299266,

'recovered': 23425467,

'latitude': 20.593684,

'longitude': 78.96288,

'last_update': 1621758045000},

......

Preparing the Data for Visualization

You have to prepare this fetched data for visualization. For that you have to convert the list of dictionaries in the Pandas dataframe. It can be done as follows:

import pandas as pd

df = pd.DataFrame(covid.get_data())

print(df)

Figure 18-3 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig3_HTML.jpg — Figure 18-3
Pandas dataframe for COVID-19 data

You can sort it as follows:

sorted = df.sort_values(by=['confirmed'], ascending=False)

Then you have to exclude the data for the world and continents so only the data for the individual countries remains.

excluded = sorted [ ~sorted.country.isin(['Europe', 'Asia',

'South America',

'World', 'Africa',

'North America'])]

Let’s find out the top ten records.

top10 = excluded.head(10)

print(top10)

You can then assign the columns to the individual variables as follows:

x = top10.country

y1 = top10.confirmed

y2 = top10.active

y3 = top10.deaths

y4 = top10.recovered

Creating Visualizations with Matplotlib and Seaborn

Let’s visualize the data with Matplotlib and Seaborn. First import all the needed libraries, as shown here:

%matplotlib inline

import matplotlib.pyplot as plt

import seaborn as sns

A simple linear plot can be obtained as follows:

plt.plot(x, y1)

plt.xticks(rotation=90)

plt.show()

Figure 18-4 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig4_HTML.jpg — Figure 18-4
Linear plot with Matplotlib

You can add a title to this plot. You can also use the Seaborn library for it. The following is an example of a line plot with Seaborn:

sns.set_theme(style='whitegrid')

sns.lineplot(x=x, y=y1)

plt.xticks(rotation=90)

plt.show()

In the code example, we are using the function set_theme() . It sets the theme for the entire notebook for the Matplotlib and Seaborn visualizations. You can pass one of the strings 'darkgrid', 'whitegrid', 'dark', 'white', or 'ticks' as an argument to this function. Figure 18-5 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig5_HTML.jpg — Figure 18-5
Linear plot with Seaborn

You can create a simple bar plot with Matplotlib as follows:

plt.bar(x, y1)

plt.xticks(rotation=45)

plt.show()

Figure 18-6 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig6_HTML.jpg — Figure 18-6
Bar plot with Matplotlib

The same visualization can be prepared with Seaborn, which produces a much better bar plot aesthetically.

sns.barplot(x=x, y=y1)

plt.xticks(rotation=45)

plt.show()

Figure 18-7 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig7_HTML.jpg — Figure 18-7
Bar plot with Seaborn

You can even change the color palette as follows:

sns.barplot(x=x, y=y1,

palette="Blues_d")

plt.xticks(rotation=45)

plt.show()

Figure 18-8 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig8_HTML.jpg — Figure 18-8
Bar plot using Seaborn with custom palette

You can create a multiline graph as follows:

labels = ['Confirmed', 'Active', 'Deaths', 'Recovered']

plt.plot(x, y1, x, y2, x, y3, x, y4)

plt.legend(labels, loc='upper right')

plt.xticks(rotation=90)

plt.show()

Figure 18-9 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig9_HTML.jpg — Figure 18-9
Multiline graph

You can use the Seaborn library to create the same graph as follows:

sns.lineplot(x=x, y=y1)

sns.lineplot(x=x, y=y2)

sns.lineplot(x=x, y=y3)

sns.lineplot(x=x, y=y4)

plt.legend(labels, loc='upper right')

plt.xticks(rotation=45)

plt.show()

Figure 18-10 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig10_HTML.jpg — Figure 18-10
Multiline graph with Seaborn

You will now see how to create a multiple-bar graph with Matplotlib as follows:

df2 = pd.DataFrame([y1, y2, y3, y4])

df2.plot.bar()

plt.legend(x, loc='best')

plt.xticks(rotation=45)

plt.show()

Figure 18-11 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig11_HTML.jpg — Figure 18-11
Multiline bar graph

You can even show this in a horizontal fashion as follows:

df2.plot.barh()

plt.legend(x, loc='best')

plt.xticks(rotation=45)

plt.show()

Figure 18-12 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig12_HTML.jpg — Figure 18-12
Multiline horizontal graph

You can use Seaborn to create a scatter plot as follows:

sns.scatterplot(x=x, y=y1)

sns.scatterplot(x=x, y=y2)

sns.scatterplot(x=x, y=y3)

sns.scatterplot(x=x, y=y4)

plt.legend(labels, loc='best')

plt.xticks(rotation=45)

plt.show()

Figure 18-13 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig13_HTML.jpg — Figure 18-13
Multiline horizontal bar graph

You can even create an area plot with Matplotlib with the following code:

df2.plot.area()

plt.legend(x, loc='best')

plt.xticks(rotation=45)

plt.show()

Figure 18-14 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig14_HTML.jpg — Figure 18-14
Stacked area plot

You can create an unstacked and transparent area plot for the data as follows:

df2.plot.area(stacked=False)

plt.legend(x, loc='best')

plt.xticks(rotation=45)

plt.show()

Figure 18-15 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig15_HTML.jpg — Figure 18-15
Stacked area plot

You can create a pie chart as follows:

plt.pie(y3, labels=x)

plt.title('Death Toll')

plt.show()

Figure 18-16 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig16_HTML.jpg — Figure 18-16
Pie chart

You can also create a KDE plot with a rug plot, but with the data that we’re using for this example, that may not make a lot of sense.

sns.set_theme(style="ticks")

sns.kdeplot(x=y1)

sns.rugplot(x=y1)

plt.show()

Figure 18-17 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig17_HTML.jpg — Figure 18-17
KDE plot

Creating Visualizations of Animal Disease Data

You can create visualizations for other real-life datasets too. Let’s create visualizations for animal disease data. Let’s first read it from an online repository.

df = pd.read_csv("https://github.com/Kesterchia/Global-animal-diseases/blob/main/Data/Outbreak_240817.csv?raw=True")

Let’s see the top five records.

df.head()

Figure 18-18 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig18_HTML.jpg — Figure 18-18
Animal disease data

Let’s get information about the columns as follows:

df.info()

The output is as follows:

RangeIndex: 17008 entries, 0 to 17007

Data columns (total 24 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Id 17008 non-null int64

1 source 17008 non-null object

2 latitude 17008 non-null float64

3 longitude 17008 non-null float64

4 region 17008 non-null object

5 country 17008 non-null object

6 admin1 17008 non-null object

7 localityName 17008 non-null object

8 localityQuality 17008 non-null object

9 observationDate 16506 non-null object

10 reportingDate 17008 non-null object

11 status 17008 non-null object

12 disease 17008 non-null object

13 serotypes 10067 non-null object

14 speciesDescription 15360 non-null object

15 sumAtRisk 9757 non-null float64

16 sumCases 14535 non-null float64

17 sumDeaths 14168 non-null float64

18 sumDestroyed 13005 non-null float64

19 sumSlaughtered 12235 non-null float64

20 humansGenderDesc 360 non-null object

21 humansAge 1068 non-null float64

22 humansAffected 1417 non-null float64

23 humansDeaths 451 non-null float64

dtypes: float64(10), int64(1), object(13)

memory usage: 3.1+ MB

Let’s perform a “group by” operation on the column country and compute the sum of total cases, as shown here:

df2 = pd.DataFrame(df.groupby('country').sum('sumCases')['sumCases'])

Now let’s sort and select the top ten cases.

df3 = df2.sort_values(by='sumCases', ascending = False).head(10)

Let’s plot a bar graph, using the following code:

df3.plot.bar()

plt.xticks(rotation=90)

plt.show()

Figure 18-19 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig19_HTML.jpg — Figure 18-19
Bar chart

You can convert the index to a column as follows:

df3.reset_index(level=0, inplace=True)

df3

The output is as follows:

country sumCases

0 Italy 846756.0

1 Iraq 590049.0

2 Bulgaria 453353.0

3 China 370357.0

4 Taiwan (Province of China) 296268.0

5 Egypt 284449.0

6 Iran (Islamic Republic of) 225798.0

7 Nigeria 203688.0

8 Germany 133425.0

9 Republic of Korea 117018.0

Let’s make a pie chart as follows:

plt.pie(df3['sumCases'],

labels=df3['country'])

plt.title('Death Toll')

plt.show()

Figure 18-20 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig20_HTML.jpg — Figure 18-20
Pie chart

You can create a more aesthetically pleasing bar chart with Seaborn as follows:

sns.barplot(x='country',

y='sumCases',

data=df3)

plt.xticks(rotation=90)

plt.show()

Figure 18-21 shows the output.

../images/515442_1_En_18_Chapter/515442_1_En_18_Fig21_HTML.jpg — Figure 18-21
Bar chart with Seaborn

You’ve just learned to visualize real-life animal disease data.

Summary

In this chapter, you explored more functionality of the Seaborn data visualization library, which is part of the scientific Python ecosystem. You also learned how to import real-life data into Jupyter Notebook. You used the Matplotlib and Seaborn libraries to visualize the data.

As you know, this is the last chapter in the book. While we explored Matplotlib in great detail, we have just scratched the surface of the vast body of knowledge and programming APIs. You now have the knowledge to further explore Matplotlib and other data visualization libraries on your own. Python has many data visualization libraries for scientific data. Examples include Plotly, Altair, and Cartopy. Armed with your knowledge of the basics of data visualization, have fun continuing your journey further into data science and visualization!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 18. Visualizing Real-Life Data with Matplotlib and Seaborn

Create new playlist

Sign In

Sign Up

18. Visualizing Real-Life Data with Matplotlib and Seaborn

COVID-19 Pandemic Data

Fetching the Pandemic Data Programmatically

Preparing the Data for Visualization

Creating Visualizations with Matplotlib and Seaborn

Creating Visualizations of Animal Disease Data

Summary

Table of Contents for
18. Visualizing Real-Life Data with Matplotlib and Seaborn