Chapter 5. Python and Geographic Information Systems

This chapter will focus on applying Python to functions typically performed by a geographic information system (GIS) such as QGIS or ArcGIS. We will continue to use as few external dependencies as possible outside Python itself, so you have tools which are as reusable as possible in different environments. In this book, we separate GIS analysis and remote sensing from programming perspective, which means that in this chapter, we'll focus on mostly vector data.

As with other chapters in this book, the items presented here are core functions that serve as building blocks which can be recombined to solve challenges you will encounter beyond this book. This chapter includes the following topics:

  • Measuring distance
  • Converting coordinates
  • Reprojecting vector data
  • Editing shapefiles
  • Selecting data from within larger datasets
  • Creating thematic maps
  • Conversion of non-GIS data types
  • Geocoding

This chapter contains many code samples. In addition to the text, code comments are included as guides within the samples.

Measuring distance

The essence of geospatial analysis is discovering the relationships of objects on the Earth. Items which are closer together tend to have a stronger relationship than those which are farther apart. This concept is known as Tobler's First Law of Geography. Therefore, measuring distance is a critical function of geospatial analysis.

As you have learned, every map is a model of the Earth, and they are all wrong to some degree. For this reason, measuring accurate distance between two points on the Earth while sitting in front of a computer is impossible. Even professional land surveyors who go out in the field with both traditional sighting equipment and very precise GPS equipment fail to account for every anomaly on the Earth's surface between point A and point B. So, in order to measure distance, we must look at what we are measuring, how much we are measuring, and how much accuracy we need.

There are three models of the Earth we can use to calculate distance:

  • Flat plane
  • Spherical
  • Ellipsoid

In the flat plane model, standard Euclidean geometry is used. The Earth is considered a flat plane with no curvature as shown in the following figure:

Measuring distance

This model makes math quite simple because you work with straight lines. The most common format for geospatial coordinates is decimal degrees. However, decimal degree coordinates are reference measurements on a sphere taken as angles between the longitude and the prime meridian, and the latitude and equator. Furthermore, the lines of longitude converge toward zero at the poles. The circumference of each line of latitude becomes smaller toward the poles as well. These facts mean decimal degrees are not a valid coordinate system for Euclidean geometry, which uses infinite planes.

Map projections attempt to simplify the issues of dealing with a three-dimensional ellipsoid in a two-dimensional plane, which could be either a paper or a computer screen. As discussed in Chapter 1, Learning Geospatial Analysis with Python, map projections flatten a round model of the Earth to a plane and introduce distortion in exchange for the convenience of a map. Once this projection is in place and decimal degrees are traded for a Cartesian coordinate system with x and y coordinates, we can use the simplest forms of Euclidean geometry—the Pythagorean theorem.

At a large enough scale, a sphere or ellipsoid like the Earth, appears more like a plane than a sphere. In fact, for centuries, everyone thought that the Earth was flat! If the difference in degrees of longitude is small enough, you can often get away with using Euclidean geometry and then converting the measurements to meters, kilometers, or miles. This method is generally not recommended, but the decision is ultimately up to you and your requirements for accuracy as an analyst.

The spherical model approach tries to better approximate reality by avoiding the problems resulting from smashing the Earth onto a flat surface. As the name suggests, this model uses a perfect sphere for representing the Earth (similar to a physical globe), which allows us to work with degrees directly. This model ignores the fact that the Earth is really more of an egg-shaped ellipsoid with varying degrees of thickness in its crust. But by working with distance on the surface of a sphere, we can begin to measure longer distances with more accuracy. The following figure illustrates this concept:

Measuring distance

Using the ellipsoid model of the Earth, analysts strive for the best model of the Earth's surface. There are several ellipsoid models which are called datums. A datum is a set of values which define an estimated shape for the Earth, also known as a geodetic system. Like any other georeferencing system, a datum can be optimized for a localized area. The most commonly used datum is called WGS84, which is designed for global use. You should be aware that the WGS84 is occasionally updated as assessment techniques and technology improves. The most recent revision occurred in 2004. In North America, the NAD83 datum is used to optimize referencing over the continent. In the Eastern Hemisphere, the European Terrestrial Reference System 1989 (ETRS89) is used more frequently. ETRS89 is fixed to the stable part of the Eurasian Plate. Maps of Europe based on ETRS89 are immune to continental drift which changes up to 2.5 cm per year as the Earth's crust shifts.

An ellipsoid does not have a constant radius from the center. This fact means the formulas used in the spherical model of the Earth begin to have issues in the ellipsoid model. Though not a perfect approximation, it is much closer to reality than the spherical model. The following figure shows a generic ellipsoid model denoted by a black line contrasted against a representation of the Earth's uneven crust using the red line to represent the geoid. Although we will not use it for these examples, another model is the geoid model. The geoid is the most precise and accurate model of the Earth which is based on the Earth's surface with no influences except gravity and rotation. The following graphic is a representation of a geoid, ellipsoid, and spherical model to demonstrate the differences:

Measuring distance

Pythagorean theorem

Now that we've discussed these different models of the Earth and the issues in measuring them, let's look at some solutions using Python. We'll start measuring with the simplest method using the Pythagorean Theorem, also known as Euclidean distance. If you remember your geometry lessons from school, the Pythagorean theorem asserts the following equation:

a2+b2=c2

In this assertion, the variables a, b, and c are all sides of a triangle. You can solve for any one side if you know the other two. In this example, we'll start with two projected points in the Mississippi Transverse Mercator (MSTM) projection. The units of this projection are in meters. The x axis locations are measured from the central meridian defined by the westernmost location in the state. The y axis is defined from the NAD83 horizontal datum. The first point, defined as (x1,y1), represents Jackson—the state capital of Mississippi. The second point, defined as (x2,y2), represents the city of Biloxi, which is a coastal town, as shown in the following figure:

Pythagorean theorem

Tip

In the following example, the double asterisk (**) in Python is the syntax for exponents, which we'll use to square the distances.

We'll import the Python math module for its square root function called sqrt(). Then, we'll calculate the x axis and y axis distances. Finally, we'll use these variables to execute the Euclidean distance formula to get the distance across the bounding box in meters from an (x, y) origin used in the MSTM projection:

>>> import math
>>> x1 = 456456.23
>>> y1 = 1279721.064
>>> x2 = 576628.34
>>> y2 = 1071740.33
>>> x_dist = x1 - x2
>>> y_dist = y1 - y2
>>> dist_sq = x_dist**2 + y_dist**2
>>> distance = math.sqrt(dist_sq)
>>> distance
240202.66

So, the distance is approximately 240,202 meters, which is around 240.2 kilometers or 150 miles. This calculation is reasonably accurate because this projection is optimized for measuring distance and area in Mississippi using Cartesian coordinates.

We can also measure distance using decimal degrees, but we must perform a few additional steps. In order to measure using degrees, we must first convert the angles to radians, which accounts for the curved surface distance between the coordinates. We'll also multiply our output in radians with the radius of the Earth in meters to convert back from radians. You can read more about radians at http://en.wikipedia.org/wiki/Radian.

We'll perform this conversion using the Python math.radians() method when we calculate the x and y distances, as shown here:

>>> import math
>>> x1 = -90.21
>>> y1 = 32.31
>>> x2 = -88.95
>>> y2 = 30.43
>>> x_dist = math.radians(x1 - x2)
>>> y_dist = math.radians(y1 - y2)
>>> dist_sq = x_dist**2 + y_dist**2
>>> dist_rad = math.sqrt(dist_sq)
>>> dist_rad * 6371251.46
251664.46

OK, this time we came up with around 251 kilometers which is 11 kilometers more than our first measurement. So, as you can see, your choice of measurement algorithm and Earth model can have significant consequences. Using the same equation, we come up with radically different answers, depending on our choice of coordinate system and Earth model.

Tip

You can read more about Euclidean distance at http://mathworld.wolfram.com/Distance.html.

Haversine formula

A part of the problem with just plugging in unprojected decimal degrees into the Pythagorean theorem is the concept of Great Circle distance. A Great Circle is the shortest distance between two points on a sphere. Another important feature which defines a Great Circle is the circle, which if followed all the way around the sphere, will bisect the sphere into two equal halves, as shown in the following Wikipedia figure (Jhbdel, Wikipedia):

Haversine formula

So what is the right way to measure in decimal degrees? The most popular method is the Haversine formula which uses trigonometry to calculate the Great Circle distance using coordinates defined in decimal degrees as input. Once again, we'll convert the axis distances from degrees to radians before we apply the formula, just like the previous example. But this time, we'll also convert the latitude (y axis) coordinates to radians separately, as shown here:

>>> import math
>>> x1 = -90.212452861859035
>>> y1 = 32.316272202663704
>>> x2 = -88.952170968942525
>>> y2 = 30.438559624660321
>>> x_dist = math.radians(x1 - x2)
>>> y_dist = math.radians(y1 - y2)
>>> y1_rad = math.radians(y1)
>>> y2_rad = math.radians(y2)
>>> a = math.sin(y_dist/2)**2 + math.sin(x_dist/2)**2 
>>>     * math.cos(y1_rad) * math.cos(y2_rad)
>>> c = 2 * math.asin(math.sqrt(a))
>>> distance = c * 6371  # kilometers
>>> print(distance)
240.63

Wow! 240.6 kilometers using the Haversine formula compared to 240.2 kilometers using the optimized and more accurate projection. This difference is less than half a kilometer, which is not bad for a distance calculation of two cities that are 150 miles apart. The Haversine formula is the most commonly used distance measuring formula because it is relatively lightweight from a coding perspective and reasonably accurate in most cases. It is considered to be accurate within about a meter.

To summarize what you've learned so far, most of the point coordinates you encounter as an analyst are in unprojected decimal degrees. So, these are some of the options for measurement:

  • Reproject to a distance-accurate Cartesian projection and measure
  • Just use the Haversine formula and see how far it takes you for your analysis
  • Use the even more precise Vincenty's formula

That's right! There's another formula which seeks to provide an even better measurement than Haversine.

Vincenty's formula

So we've examined distance measurement using the Pythagorean theorem (flat Earth model) and the Haversine formula (spherical Earth model). Vincenty's formula accounts for the ellipsoid model of the Earth. And if you are using a localized ellipsoid, it can be accurate within far less than a meter. In the following implementation of this formula, you can change the semi-major axis value and flattening ratio to fit the definition of any ellipsoid. Let's see what the distance is when we measure using the Vincenty's formula on the NAD83 ellipsoid:

import math
distance = None
x1 = -90.212452861859035
y1 = 32.316272202663704
x2 = -88.952170968942525
y2 = 30.438559624660321
# Ellipsoid Parameters
# Example is NAD83
a = 6378137  # semi-major axis
f = 1/298.257222101  # inverse flattening
b = abs((f*a)-a)  # semi-minor axis
L = math.radians(x2-x1)
U1 = math.atan((1-f) * math.tan(math.radians(y1)))
U2 = math.atan((1-f) * math.tan(math.radians(y2)))
sinU1 = math.sin(U1)
cosU1 = math.cos(U1)
sinU2 = math.sin(U2)
cosU2 = math.cos(U2)
lam = L
for i in range(100):
    sinLam = math.sin(lam)
    cosLam = math.cos(lam)
    sinSigma = math.sqrt((cosU2*sinLam)**2 +
        (cosU1*sinU2-sinU1*cosU2*cosLam)**2)
    if (sinSigma == 0):
        distance = 0  # coincident points
        break
    cosSigma = sinU1*sinU2 + cosU1*cosU2*cosLam
    sigma = math.atan2(sinSigma, cosSigma)
    sinAlpha = cosU1 * cosU2 * sinLam / sinSigma
    cosSqAlpha = 1 - sinAlpha**2
    cos2SigmaM = cosSigma - 2*sinU1*sinU2/cosSqAlpha
    if math.isnan(cos2SigmaM):
        cos2SigmaM = 0  # equatorial line
    C = f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha))
    LP = lam
    lam = L + (1-C) * f * sinAlpha * 
        (sigma + C*sinSigma*(cos2SigmaM+C*cosSigma *
        (-1+2*cos2SigmaM*cos2SigmaM)))
    if not abs(lam-LP) > 1e-12:
        break
uSq = cosSqAlpha * (a**2 - b**2) / b**2
A = 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)))
B = uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)))
deltaSigma = B*sinSigma*(cos2SigmaM+B/4 *
    (cosSigma*(-1+2*cos2SigmaM*cos2SigmaM) -
    B/6*cos2SigmaM*(-3+4*sinSigma*sinSigma) *
    (-3+4*cos2SigmaM*cos2SigmaM)))
s = b*A*(sigma-deltaSigma)
distance = s
print(distance)
240237.66693880095

Using the Vincenty's formula, our measurement came to 240.1 kilometers, which was only 100 meters off from our projected measurement using Euclidean distance. That's impressive! While it's many times more mathematically complex than the Haversine formula, you can see that it is also much more accurate.

Tip

The pure Python geopy module includes an implementation of the Vincenty's formula and has the ability to geocode locations as well, by turning place names into latitude and longitude coordinates, as shown here:

http://geopy.readthedocs.org/en/latest/

The points used in these examples are reasonably close to the equator. As you move towards the poles or work with larger or extremely small distances, the choices you make become increasingly more important. If you're just trying to make a radius around a city to select locations for a marketing campaign promoting a concert, then an error of a few kilometers is probably acceptable. However, if you're trying to estimate the volume of fuel required for an airplane to make a flight between two airports, then you want to be spot on!

If you'd like to learn more about issues with measuring distance and direction, and how to work around them with programming, visit the following site:

http://www.movable-type.co.uk/scripts/latlong.html

On this site, Chris Veness goes into great detail on this topic and provides online calculators as well as examples written in JavaScript, which are easily ported to Python. The Vincenty's formula implementation that we just saw is ported from the JavaScript on this site.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset