Raster data

Raster data consists of rows and columns of cells or pixels, with each cell representing a single value. The easiest way to think of raster data is as images, which is how they are typically represented by software. However, raster datasets are not necessarily stored as images. They can also be ASCII text files or Binary Large Objects (BLOBs) in databases.

Another difference between geospatial raster data and regular digital images is resolution. Digital images express resolution as dots-per-inch if printed in full size. Resolution can also be expressed as the total number of pixels in the image defined as megapixels. However, geospatial raster data uses the ground distance that each cell represents. For example, a raster dataset with a two-foot resolution means that a single cell represents two feet on the ground, which also means that only objects larger than two feet can be identified visually in the dataset.

Raster datasets may contain multiple bands, meaning that different wavelengths of light can be collected at the same time over the same area. Often, this range is from 3-7 bands but can be several hundred in hyperspectral systems. These bands are viewed individually or swapped in and out as the RGB bands of an image. They can also be recombined into a derivative single-band image using mathematics and then recolored using a set number of classes representing like values within the dataset.

Another common application of raster data is in the field of scientific computing which shares many elements of geospatial remote sensing but adds some interesting twists. Scientific computing often uses complex raster formats, including Network Common Data Form (NetCDF), GRIB, and HDF5, which store entire data models. These formats are more like directories in a filesystem and can contain multiple datasets or multiple versions of the same dataset. Oceanography and meteorology are the most common applications of this kind of analysis. An example of a scientific computing dataset is the output of a weather model, where the cells of the raster dataset in different bands may represent different variables' output from the model in a time series.

Like vector data, raster data can come in a variety of formats. The open source raster library called Geospatial Data Abstraction Library (GDAL), which actually includes the vector OGR library mentioned earlier, lists over 130 supported raster formats (http://www.gdal.org/formats_list.html). The FME software package supports this many as well. However, just like shapefiles and CAD data, there are a few standout raster formats.

TIFF files

The Tagged Image File Format (TIFF), is the most common geospatial raster format. The TIFF format's flexible tagging system allows it to store any type of data, whatsoever, in a single file. TIFFs can contain overview images, multiple bands, integer elevation data, basic metadata, internal compression, and a variety of other data typically stored in additional supporting files by other formats. Anyone can extend the TIFF format unofficially by adding tagged data to the file structure. This extensibility has benefits and drawbacks, however. A TIFF file may work fine in one piece of software but fails when accessed in another because the two software packages implement the massive TIFF specification to different degrees. An old joke about TIFFs has a frustrating amount of truth to it: TIFF stands for Thousands of Incompatible File Formats. The GeoTIFF extension defines how geospatial data is stored. Geospatial rasters stored as TIFF files may have any of the following file extensions: .tiff, .tif, or .gtif.

JPEG, GIF, BMP, and PNG

JPEG, GIF, BMP, and PNG formats are common image formats, in general, but can be used for basic geospatial data storage as well. Typically, these formats rely on accompanying the supporting text files for the georeferencing of the information in order to make them compatible with the GIS software such as WKT, .prj, or world files described here.

The JPEG format is also fairly common for geospatial data. JPEGs have a built-in metadata tagging system similar to TIFFs called EXIF. JPEGs are commonly used for geotagged photographs in addition to raster GIS layers. Bitmap (BMP) images are used for desktop applications and document graphics. However, JPEG, GIF, and PNG are the formats used in web mapping applications, especially for server pregenerated map tiles for quick access via slippy maps.

Compressed formats

As geospatial rasters tend to be very large, they are often stored using advanced compression techniques. The latest open standard is the JPEG 2000 format, which is an update of the JPEG format and includes wavelet compression and a few other features such as georeferencing data. Multi-resolution Seamless Image Database (MrSID) (.sid) and Enhanced Compression Wavelet (ECW) (.ecw) are two proprietary wavelet compression formats often seen in geospatial contexts. The TIFF format supports compression including the Lempel-Ziv-Welch (LZW) algorithm. It must be noted that compressed data is suitable as part of a base map but should not be used for remote sensing processing. Compressed images are designed to look visually correct but often alter the original cell value. Lossless compression algorithms try to avoid degrading the source data but it's generally considered a bad idea to attempt spectral analysis on data that has been through compression. The JPEG format is designed to be a lossy format that sacrifices data for a smaller file size. It is also commonly encountered so it is important to remember this fact to avoid invalid results.

ASCII Grids

Another means of storing raster data, often elevation data, is in ASCII Grid files. This file format was created by Esri but has become an unofficial standard supported by most software packages. An ASCII Grid is a simple text file containing (x, y) values as rows and columns. The spatial information for the raster is contained in a simple header. The format of the file is as follows:

<NCOLS xxx>
<NROWS xxx>
<XLLCENTER xxx | XLLCORNER xxx>
<YLLCENTER xxx | YLLCORNER xxx>
<CELLSIZE xxx>
{NODATA_VALUE xxx}
row 1
row 2
.
.
.
row n

While not the most efficient way to store data, ASCII Grid files are very popular because they don't require any special data libraries to create or access geospatial raster data. These files are often distributed as zip files. The header values in the preceding format contain the following information:

  • The number of columns
  • The number of rows
  • The x-axis cell center coordinate | x-axis lower-left corner coordinate
  • The y-axis cell center coordinate | y-axis lower-left corner coordinate
  • The cell size in mapping units
  • The no-data value (typically, 9999)

World files

World files are simple text files, which can provide geospatial referencing information to any image externally for file formats that typically have no native support for spatial information, including JPEG, GIF, PNG, and BMP. The world file is recognized by geospatial software due to its naming convention. The most common way to name a world file is to use the raster file name and then alter the extension to remove the middle letter and add w at the end. The following table shows some examples of raster images in different formats and the associated world file name based on the convention:

Raster file name

World file name

World.jpg

World.jpw

World.tif

World.tfw

World.bmp

World.bpw

World.png

World.pgw

World.gif

World.gfw

The structure of a world file is very simple. It is a six-line text file as follows:

  • Line 1: The cell size along the x axis in ground units
  • Line 2: The rotation on the y axis
  • Line 3: The rotation on the x axis
  • Line 4: The cell size along the y axis in ground units
  • Line 5: The center x-coordinate of the upper left cell
  • Line 6: The center y-coordinate of the upper left cell

The following is an example of world file values:

15.0
0.0
0.0
-15.0
-89,38
45.0

The (x, y) coordinates and the (x, y) cell size contained in lines 1, 4, 5, and 6 allow you to calculate the coordinate of any cell or the distance across a set of cells. The rotation values are important for geospatial software because remotely sensed images are often rotated due to the data collection platform. Rotating the images runs the risk of resampling the data and, therefore, data loss so the rotation values allow the software to account for the distortion. The surrounding pixels outside the image are typically assigned a no data value and represented as the color black. The following image, courtesy of the U.S. Geological Survey (USGS), demonstrates image rotation, where the satellite collection path is oriented from southeast to northeast but the underlying base map is north:

World files

World files are a great tool when working with raster data in Python. Most geospatial software and data libraries support world files so they are usually a good choice for the georeferencing.

Tip

You'll find that world files are very useful, but as you use them infrequently, you forget what the unlabeled contents represent. A quick reference for world files is available at http://kralidis.ca/gis/worldfile.htm.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset