Performing selections

The previous subsetting example is one way to select data. There are many other ways to subset data for further analysis. In this section, we'll examine some of them.

Point in polygon formula

We briefly discussed the point in polygon formula in Chapter 1, Learning Geospatial Analysis with Python, as a common type of geospatial operation. You'll find that it is one of the most useful formulas out there. The formula is relatively straightforward. The following function performs this check using the Ray Casting method. This method draws a line from the test point all the way through the polygon and counts the number of times it crosses the polygon boundary. If the count is even, the point is outside the polygon. If it is odd, then it's inside. This particular implementation also checks to see if the point is on the edge of the polygon, as shown here:

def point_in_poly(x,y,poly):
    # check if point is a vertex
    if (x,y) in poly: return True
    # check if point is on a boundary
    for i in range(len(poly)):
        p1 = None
        p2 = None
        if i==0:
            p1 = poly[0]
            p2 = poly[1]
        else:
            p1 = poly[i-1]
            p2 = poly[i]
        if p1[1] == p2[1] and p1[1] == y and x > min(p1[0], 
          p2[0]) and x < max(p1[0], p2[0]):
            return True
      
    n = len(poly)
    inside = False

    p1x,p1y = poly[0]
    for i in range(n+1):
        p2x,p2y = poly[i % n]
        if y > min(p1y,p2y):
            if y <= max(p1y,p2y):
                if x <= max(p1x,p2x):
                    if p1y != p2y:
                        xints = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x
                    if p1x == p2x or x <= xints:
                        inside = not inside
        p1x,p1y = p2x,p2y

    if inside: return True
    return False

Now, let's use the point_in_poly() function to test a point in Chile:

>>> # Test a point for inclusion
>>> myPolygon = [(-70.593016,-33.416032), (-70.589604,-33.415370),
(-70.589046,-33.417340), (-70.592351,-33.417949),
(-70.593016,-33.416032)]
>>> # Point to test
>>> lon = -70.592000
>>> lat = -33.416000
>>> print(point_in_poly(lon, lat, myPolygon))
True

The point is inside. Let's also verify that edge points will be detected:

>>> # test an edge point
>>> lon = -70.593016
>>> lat = -33.416032
>>> print(point_in_poly(lon, lat, myPolygon))
True

You'll find new uses for this function all the time. It's definitely one to keep in your toolbox.

Bounding Box Selections

We'll go through one more example using a simple bounding box to isolate a complex set of features and save it in a new shapefile. In this example, we'll subset the roads on the island of Puerto Rico from the mainland U.S. Major Roads shapefile. You can download the shapefile from the following link:

https://github.com/GeospatialPython/Learn/raw/master/roads.zip

Floating-point coordinate comparisons can be expensive, but we are using a box and not an irregular polygon, and so this code is efficient enough for most operations:

>>> import shapefile
>>> r = shapefile.Reader("roadtrl020") 
>>> w = shapefile.Writer(r.shapeType)
>>> w.fields = list(r.fields)
>>> xmin = -67.5
>>> xmax = -65.0
>>> ymin = 17.8
>>> ymax = 18.6
>>> for road in r.iterShapeRecords():
>>>     geom = road.shape
>>>     rec = road.record
>>>     sxmin, symin, sxmax, symax = geom.bbox
>>>     if sxmin <  xmin: continue
>>>     elif sxmax > xmax: continue
>>>     elif symin < ymin: continue
>>>     elif symax > ymax: continue
>>>     w._shapes.append(geom)
>>>     w.records.append(rec)
>>> w.save("Puerto_Rico_Roads")

Attribute selections

We've now seen two different ways of subsetting a larger dataset resulting in a smaller one based on spatial relationships. Now, let's examine a quick way to subset vector data using the attribute table. In this example, we'll use a polygon shapefile that has densely populated urban areas within Mississippi. You can download this zipped shapefile, which is available at the following link:

http://git.io/vLbU9

This script is really quite simple. It creates the Reader and Writer objects and copies the dbf fields; then it loops through the records for matching attributes and adds them to Writer. We'll select urban areas with a population of less than 5,000, as shown here:

>>> import shapefile
>>> # Create a reader instance
>>> r = shapefile.Reader("MS_UrbanAnC10")
>>> # Create a writer instance
>>> w = shapefile.Writer(r.shapeType)
>>> # Copy the fields to the writer
>>> w.fields = list(r.fields)
>>> # Grab the geometry and records from all features
>>> # with the correct population
>>> selection = []
>>> for rec in enumerate(r.records()):
...    if rec[1][14] < 5000:
...       selection.append(rec)
>>> # Add the geometry and records to the writer
>>> for rec in selection:
...    w._shapes.append(r.shape(rec[0]))
...    w.records.append(rec[1])
>>> # Save the new shapefile
>>> w.save("MS_Urban_Subset")

Attribute selections are typically fast. Spatial selections are computationally expensive because of floating point calculations. Whenever possible, make sure you are enable to use attribute selection to subset first. The following figure shows the starting shapefile containing all urban areas on the left with a state boundary, and the urban areas with less than 5,000 people on the right after the previous attribute selection:

Attribute selections

Let's see what that same example looks like using fiona, which takes advantage of the OGR library. We'll use nested statements to reduce the amount of code needed to properly open and close the files, as shown here:

>>> import fiona
>>> with fiona.open("MS_UrbanAnC10.shp") as sf:
>>>     filtered = filter(lambda f: f['properties']['POP'] < 5000, sf)
>>>     # Shapefile file format driver
>>>     drv = sf.driver
>>>     # Coordinate Reference System
>>>     crs = sf.crs
>>>     # Dbf schema
>>>     schm = sf.schema
>>>     subset = "MS_Urban_Fiona_Subset.shp"
>>>     with fiona.open(subset, "w", 
>>>         driver=drv, 
>>>         crs=crs, 
>>>         schema=schm) as w:
>>>             for rec in filtered:
>>>                 w.write(rec)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset