Working with geospatial data

For our last case study, let us explore the analysis of geospatial data using an extension to the Pandas library, GeoPandas. You will need to have GeoPandas installed in your IPython environment to follow this example. If it is not already installed, you can add it using easy_install or pip.

Loading geospatial data

In addition to our other dependencies, we will import the GeoPandas library using the command:

>>> import GeoPandas as geo.

We load dataset for this example, the coordinates of countries in Africa ("Africa." Maplibrary.org. Web. 02 May 2016. http://www.mapmakerdata.co.uk.s3-website-eu-west-1.amazonaws.com/library/stacks/Africa/) which are contained in a shape (.shp) file as before into a GeoDataFrame, an extension of the Pandas DataFrame, using:

>>> africa_map = geo.GeoDataFrame.from_file('Africa_SHP/Africa.shp')

Examining the first few lines using head():

Loading geospatial data

We can see that the data consists of identifier columns, along with a geometry object representing the shape of the country. The GeoDataFrame also has a plot() function, to which we can pass a column argument that gives the field to use for generating the color of each polygon using:

>>> africa_map.plot(column='CODE')

Which gives the following visualization:

Loading geospatial data

However, right now this color code is based on the country name, so does not offer much insight about the map. Instead, let us try to color each country based on its population using information about the population density of each country (Population by Country – Thematic Map – World. Population by Country – Thematic Map-World. Web. 02 May 2016, http://www.indexmundi.com/map/?v=21). First we read in the population using:

>>> africa_populations = pd.read_csv('Africa_populations.tsv',sep='	')

Note that here we have applied the sep=' ' argument to read_csv(), as the columns in this file are not comma separated like the other examples thus far. Now we can join this data to the geographical coordinates using merge:

>>> africa_map = pd.merge(africa_map,africa_populations,left_on='COUNTRY',right_on='Country_Name')

Unlike the example with oil prices and crash fatalities above, here the columns we wish to use to join the data has a different name in each dataset, so we must use the left_on and right_on arguments to specify the desired column in each table. We can then plot the map with colors derived from the population data using:

>>> africa_map.plot(column='Population',colormap='hot')

Which gives the new map as follows:

Loading geospatial data

Now we can clearly see the most populous countries (Ethiopia, Democratic Republic of Congo, and Egypt) highlighted in white.

Working in the cloud

In the previous examples, we have assumed you are running the IPython notebook locally on your computer through your web browser. As mentioned, it is also possible for the application to run on an external server, with the user uploading files through the interface to interact with remotely. One convenient form of such external services are cloud platforms such as Amazon Web Services (AWS), Google Compute Cloud, and Microsoft Azure. Besides offering a hosting platform to run applications like the notebook, these services also offer storage for data sets much larger than what we would be able to store in our personal computers. By running our notebook in the cloud, we can more easily interact with these distributed storage systems using a shared infrastructure for data access and manipulation that also enforces desirable security and data governance. Lastly, cheap computing resources available via these cloud services may also allow us to scale the sorts of computation we describe in later chapters, adding extra servers to handle commands entered in the notebook on the backend.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset