Clipping images

Very rarely is an analyst interested in an entire satellite scene, which can easily cover hundreds of square miles. Given the size of satellite data, we are highly motivated to reduce the size of an image to our area of interest only. The best way to accomplish this reduction is to clip an image to a boundary that defines our study area. We can use shapefiles (or other vector data) as our boundary definition and basically get rid of all the data outside this boundary. The following image contains our stretched.tif image with a county boundary file layered on the top, visualized in Quantum GIS (QGIS):

Clipping images

In order to clip the image, our next example executes the following steps:

  1. Load the image in an array using gdal_array.
  2. Create a shapefile reader using PyShp.
  3. Rasterize shapefile into a georeferenced image (convert from a vector into raster).
  4. Turn the shapefile image into a binary mask or filter to grab only the image pixels that we want within the shapefile boundary.
  5. Filter the satellite image through the mask.
  6. Discard satellite image data outside the mask.
  7. Save the clipped satellite image as clip.tif.

We installed PyShp in Chapter 4, Geospatial Python Toolbox, so you should already have it installed from PyPI. We also add a couple of useful new utility functions to this script. The first is world2pixel() that uses the GDAL GeoTransform object to do the world-coordinate to image-coordinate conversion for us. It's still the same process that we've used throughout the book, but it's better integrated with GDAL. We also add the imageToArray() function, which converts a PIL image to a NumPy array. The county boundary shapefile is the hancock.shp boundary that we've used in the previous chapters, but you can also download it here:

http://git.io/vqsRH

We use PIL because it is the easiest way to rasterize our shapefile as a mask image to filter out the pixels beyond the shapefile boundary:

import operator
from osgeo import gdal, gdal_array, osr
import shapefile
try:
    import Image
    import ImageDraw
except:
    from PIL import Image, ImageDraw

# Raster image to clip
raster = "stretched.tif"
# Polygon shapefile used to clip
shp = "hancock"
# Name of clipped raster file(s)
output = "clip"

def imageToArray(i):
    """
    Converts a Python Imaging Library array to a gdal_array image.
    """
    a = gdal_array.numpy.fromstring(i.tostring(), 'b')
    a.shape = i.im.size[1], i.im.size[0]
    return a

def world2Pixel(geoMatrix, x, y):
    """
    Uses a gdal geomatrix (gdal.GetGeoTransform()) to calculate
    the pixel location of a geospatial coordinate
    """
    ulX = geoMatrix[0]
    ulY = geoMatrix[3]
    xDist = geoMatrix[1]
    yDist = geoMatrix[5]
    rtnX = geoMatrix[2]
    rtnY = geoMatrix[4]
    pixel = int((x - ulX) / xDist)
    line = int((ulY - y) / abs(yDist))
    return (pixel, line)
# Load the source data as a gdal_array array
srcArray = gdal_array.LoadFile(raster)
# Also load as a gdal image to get geotransform (world file) info
srcImage = gdal.Open(raster)
geoTrans = srcImage.GetGeoTransform()
# Use pyshp to open the shapefile
r = shapefile.Reader("{}.shp".format(shp))
# Convert the layer extent to image pixel coordinates
minX, minY, maxX, maxY = r.bbox
ulX, ulY = world2Pixel(geoTrans, minX, maxY)
lrX, lrY = world2Pixel(geoTrans, maxX, minY)
# Calculate the pixel size of the new image
pxWidth = int(lrX - ulX)
pxHeight = int(lrY - ulY)
clip = srcArray[:, ulY:lrY, ulX:lrX]
# Create a new geomatrix for the image
# to contain georeferencing data
geoTrans = list(geoTrans)
geoTrans[0] = minX
geoTrans[3] = maxY
# Map points to pixels for drawing the county boundary
# on a blank 8-bit, black and white, mask image.
pixels = []
for p in r.shape(0).points:
    pixels.append(world2Pixel(geoTrans, p[0], p[1]))
rasterPoly = Image.new("L", (pxWidth, pxHeight), 1)
# Create a blank image in PIL to draw the polygon.
rasterize = ImageDraw.Draw(rasterPoly)
rasterize.polygon(pixels, 0)
# Convert the PIL image to a NumPy array
mask = imageToArray(rasterPoly)
# Clip the image using the mask
clip = gdal_array.numpy.choose(mask, (clip, 0)).astype(
                                gdal_array.numpy.uint8)
# Save ndvi as tiff
gdal_array.SaveArray(clip, "{}.tif".format(output),
                      format="GTiff", prototype=raster)

This script produces the following clipped image. The areas remaining outside the county boundary are actually called the NoData or fill values and are displayed in black, but ignored by most geospatial software. As images are rectangles, the NoData values are common for data that does not completely fill an image:

Clipping images

You have now walked through an entire workflow that is used by geospatial analysts around the world everyday to prepare multispectral satellite and aerial images for use in a GIS. Now, let's look at how we can actually analyze images as information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset