The ability to classify an image leads us to another remote sensing capability. Now that you've worked with shapefiles over the last few chapters, have you ever wondered where they come from? Vector GIS data such as shapefiles are typically extracted from remotely-sensed images like the examples that we've seen so far. Extraction normally involves an analyst clicking around each object in an image and drawing the feature to save it as data. It is also possible with good remotely-sensed data and proper preprocessing to automatically extract features from an image.
For this example, we'll take a subset of our Landsat 8 thermal image to isolate a group of barrier islands, as shown in the following screenshot:
You can download this image here:
Our goal with this example is to automatically extract the three islands in the image as a shapefile. Before we can do this, we need to mask out any data that we aren't interested in. For example, the water has a wide range of pixel values as do the islands themselves. If we want to extract just the islands, we need to push all the pixel values to just two bins to make the image black and white. This technique is called thresholding. The islands in the image have enough contrast with the water in the background such that thresholding should isolate them nicely.
In the following script, we will read the image to an array and then histogram the image using only two bins. We will then use the colors black and white to color the two bins. This script is simply a modified version of our classification script with a very limited output:
from osgeo import gdal_array # Input file name (thermal image) src = "islands.tif" # Output file name tgt = "islands_classified.tiff" # Load the image into numpy using gdal srcArr = gdal_array.LoadFile(src) # Split the histogram into 20 bins as our classes classes = gdal_array.numpy.histogram(srcArr, bins=2)[1] lut = [[255, 0, 0], [0, 0, 0], [255, 255, 255]] # Starting value for classification start = 1 # Set up the output image rgb = gdal_array.numpy.zeros((3, srcArr.shape[0], srcArr.shape[1], ), gdal_array.numpy.float32) # Process all classes and assign colors for i in range(len(classes)): mask = gdal_array.numpy.logical_and(start <= srcArr, srcArr <= classes[i]) for j in range(len(lut[i])): rgb[j] = gdal_array.numpy.choose(mask, (rgb[j], lut[i][j])) start = classes[i]+1 # Save the image gdal_array.SaveArray(rgb.astype(gdal_array.numpy.uint8), tgt, format="GTIFF", prototype=src)
The output looks great, as shown in the following screenshot:
The islands are clearly isolated so that our extraction script will be able to identify them as polygons and save them to a shapefile. The GDAL library has a method called Polygonize()
that does exactly this. It groups all the sets of isolated pixels in an image and saves them as a feature dataset. One interesting technique that we will use in this script is to use our input image as a mask. The Polygonize()
method allows you to specify a mask that will use the color black as a filter and will prevent the water from being extracted as a polygon, so we'll end up with just the islands. Another point to note in the script is that we copy the georeferencing information from our source image to our shapefile in order to geolocate it properly:
from osgeo import gdal, ogr, osr # Thresholded input raster name src = "islands_classified.tiff" # Output shapefile name tgt = "extract.shp" # OGR layer name tgtLayer = "extract" # Open the input raster srcDS = gdal.Open(src) # Grab the first band band = srcDS.GetRasterBand(1) # Force gdal to use the band as a mask mask = band # Set up the output shapefile driver = ogr.GetDriverByName("ESRI Shapefile") shp = driver.CreateDataSource(tgt) # Copy the spatial reference srs = osr.SpatialReference() srs.ImportFromWkt(srcDS.GetProjectionRef()) layer = shp.CreateLayer(tgtLayer, srs=srs) # Set up the dbf file fd = ogr.FieldDefn("DN", ogr.OFTInteger) layer.CreateField(fd) dst_field = 0 # Automatically extract features from an image! extract = gdal.Polygonize(band, mask, layer, dst_field, [], None)
The output shapefile is simply called extract.shp
. In Chapter 4, Geospatial Python Toolbox, we created a quick pure Python script using PyShp
and PNGCanvas
to visualize shapefiles. We'll bring that script back here to look at our shapefile, but we'll add something to it. The largest island has a small lagoon that shows up as a hole in the polygon. In order to render it properly, we have to deal with the parts in a shapefile record. The previous example using this script did not do that, so we'll add that piece as we loop through the shapefile features:
import shapefile import pngcanvas r = shapefile.Reader("extract.shp") xdist = r.bbox[2] - r.bbox[0] ydist = r.bbox[3] - r.bbox[1] iwidth = 800 iheight = 600 xratio = iwidth/xdist yratio = iheight/ydist polygons = [] for shape in r.shapes(): for i in range(len(shape.parts)): pixels = [] pt = None if i < len(shape.parts)-1: pt = shape.points[shape.parts[i]:shape.parts[i+1]] else: pt = shape.points[shape.parts[i]:] for x, y in pt: px = int(iwidth - ((r.bbox[2] - x) * xratio)) py = int((r.bbox[3] - y) * yratio) pixels.append([px, py]) polygons.append(pixels) c = pngcanvas.PNGCanvas(iwidth, iheight) for p in polygons: c.polyline(p) f = open("extract.png", "wb") f.write(c.dump()) f.close()
The following screenshot shows our automatically extracted island features. Commercial packages that do this kind of work can easily cost tens of thousands of dollars. While these packages are very robust, it is still fun and empowering to see how far you can get with simple Python scripts and a few open source packages. In many cases, you can do everything that you need to do.
The westernmost island contains the polygon hole, as shown in the following screenshot, which is zoomed to this area:
If you want to see what would happen if we didn't deal with the polygon holes, then just run the version of the script from Chapter 4, Geospatial Python Toolbox, on this same shapefile to compare the difference. The lagoon is not easy to see, but you will find it if you use the other script.
Automated feature extraction is a holy grail in geospatial analysis because of the cost and tedious effort required to extract features manually. The key to feature extraction is proper image classification. Automated feature extraction works well with water bodies, islands, roads, farm fields, buildings, and other features that tend to have high-contrast pixel values in their background.