Very rarely is an analyst interested in an entire satellite scene, which can easily cover hundreds of square miles. Given the size of satellite data, we are highly motivated to reduce the size of an image to our area of interest only. The best way to accomplish this reduction is to clip an image to a boundary that defines our study area. We can use shapefiles (or other vector data) as our boundary definition and basically get rid of all the data outside this boundary. The following image contains our stretched.tif
image with a county boundary file layered on the top, visualized in Quantum GIS (QGIS):
In order to clip the image, our next example executes the following steps:
gdal_array
.shapefile
reader using PyShp
.shapefile
into a georeferenced image (convert from a vector into raster).shapefile
image into a binary mask or filter to grab only the image pixels that we want within the shapefile boundary.clip.tif
.We installed PyShp
in Chapter 4, Geospatial Python Toolbox, so you should already have it installed from PyPI. We also add a couple of useful new utility functions to this script. The first is world2pixel()
that uses the GDAL GeoTransform
object to do the world-coordinate to image-coordinate conversion for us. It's still the same process that we've used throughout the book, but it's better integrated with GDAL. We also add the imageToArray()
function, which converts a PIL image to a NumPy array. The county boundary shapefile is the hancock.shp
boundary that we've used in the previous chapters, but you can also download it here:
We use PIL because it is the easiest way to rasterize our shapefile as a mask image to filter out the pixels beyond the shapefile boundary:
import operator from osgeo import gdal, gdal_array, osr import shapefile try: import Image import ImageDraw except: from PIL import Image, ImageDraw # Raster image to clip raster = "stretched.tif" # Polygon shapefile used to clip shp = "hancock" # Name of clipped raster file(s) output = "clip" def imageToArray(i): """ Converts a Python Imaging Library array to a gdal_array image. """ a = gdal_array.numpy.fromstring(i.tostring(), 'b') a.shape = i.im.size[1], i.im.size[0] return a def world2Pixel(geoMatrix, x, y): """ Uses a gdal geomatrix (gdal.GetGeoTransform()) to calculate the pixel location of a geospatial coordinate """ ulX = geoMatrix[0] ulY = geoMatrix[3] xDist = geoMatrix[1] yDist = geoMatrix[5] rtnX = geoMatrix[2] rtnY = geoMatrix[4] pixel = int((x - ulX) / xDist) line = int((ulY - y) / abs(yDist)) return (pixel, line) # Load the source data as a gdal_array array srcArray = gdal_array.LoadFile(raster) # Also load as a gdal image to get geotransform (world file) info srcImage = gdal.Open(raster) geoTrans = srcImage.GetGeoTransform() # Use pyshp to open the shapefile r = shapefile.Reader("{}.shp".format(shp)) # Convert the layer extent to image pixel coordinates minX, minY, maxX, maxY = r.bbox ulX, ulY = world2Pixel(geoTrans, minX, maxY) lrX, lrY = world2Pixel(geoTrans, maxX, minY) # Calculate the pixel size of the new image pxWidth = int(lrX - ulX) pxHeight = int(lrY - ulY) clip = srcArray[:, ulY:lrY, ulX:lrX] # Create a new geomatrix for the image # to contain georeferencing data geoTrans = list(geoTrans) geoTrans[0] = minX geoTrans[3] = maxY # Map points to pixels for drawing the county boundary # on a blank 8-bit, black and white, mask image. pixels = [] for p in r.shape(0).points: pixels.append(world2Pixel(geoTrans, p[0], p[1])) rasterPoly = Image.new("L", (pxWidth, pxHeight), 1) # Create a blank image in PIL to draw the polygon. rasterize = ImageDraw.Draw(rasterPoly) rasterize.polygon(pixels, 0) # Convert the PIL image to a NumPy array mask = imageToArray(rasterPoly) # Clip the image using the mask clip = gdal_array.numpy.choose(mask, (clip, 0)).astype( gdal_array.numpy.uint8) # Save ndvi as tiff gdal_array.SaveArray(clip, "{}.tif".format(output), format="GTiff", prototype=raster)
This script produces the following clipped image. The areas remaining outside the county boundary are actually called the NoData
or fill
values and are displayed in black, but ignored by most geospatial software. As images are rectangles, the NoData
values are common for data that does not completely fill an image:
You have now walked through an entire workflow that is used by geospatial analysts around the world everyday to prepare multispectral satellite and aerial images for use in a GIS. Now, let's look at how we can actually analyze images as information.