Chapter 3. The Geospatial Technology Landscape

The geospatial technology ecosystem consists of hundreds of software libraries and packages. This vast array of choices is overwhelming for newcomers to geospatial analysis. The secret to learning geospatial analysis quickly is to understand the handful of libraries and packages that really matter. Most software, both commercial and open source, is derived from these critical packages. Understanding the ecosystem of geospatial software and how it's used allows you to quickly comprehend and evaluate any geospatial tool.

Geospatial libraries can be assigned to one or more of the following high-level core capabilities, which they implement to some degree:

  • Data access
  • Computational geometry (including data reprojection)
  • Visualization
  • Metadata tools

Another important category is image processing for remote sensing; however, this category is very fragmented, containing dozens of software packages which are rarely integrated into derivative software. Most image processing software for remote sensing is based on the same data access libraries with custom image processing algorithms implemented on top of them. Take a look at the following examples of this type of software, which include both open source and commercial packages:

  • Open Source Software Image Map (OSSIM)
  • Geographic Resources Analysis Support System (GRASS)
  • Orfeo ToolBox (OTB)
  • ERDAS IMAGINE
  • ENVI

Data access libraries such as GDAL and OGR are mostly written in either C or C++ for speed and cross-platform compatibility. Speed is important due to the commonly large sizes of geospatial datasets. However, you will also see many packages written in Java. When it's well-written, pure Java can approach speeds acceptable for processing large vector or raster datasets and are usually acceptable for most applications.

The following concept map shows the major geospatial software libraries and packages and how they are related. The libraries in bold represent root libraries that are actively maintained and not significantly derived from any other libraries. These root libraries represent geospatial operations, which are sufficiently difficult to implement, and the vast majority of people choose to use one of these libraries rather than create a competing one. As you can see, a handful of libraries make up a disproportionate amount of geospatial analysis software. And the following diagram is by no means exhaustive. In this book, we'll discuss only the most commonly used packages:

The Geospatial Technology Landscape

The libraries GDAL, OGR, GEOS, and PROJ.4 are the heart and soul of the geospatial analysis community on both the commercial and open source side. It is important to note that these libraries are all written in C or C++. There is also significant work done in Java in the form of the GeoTools and JTS core libraries, which are used across a range of desktops, servers, and mobile software. Given there are hundreds of geospatial packages available and nearly all relying on these libraries to do anything meaningful, you begin to get an idea of the complexity of geospatial data access and computational geometry. Compare this software domain to that of text editors, which return over 5,000 options when searched on the open source project site http://sourceforge.net/.

Geospatial analysis is a truly worldwide community with significant contributions to the field coming from every corner of the globe. But as you learn more about the heavy-hitting packages at the center of the software landscape, you'll see that these programs tend to come from Canada or are contributed heavily by Canadian developers. Credited as the birthplace of modern GIS, geospatial analysis is a matter of national pride. Also, the Canadian government and the public-private GeoConnections program have invested heavily in research and companies both to fuel the industry for economic reasons and out of necessity, to better manage the country's vast natural resources and the needs of its population.

In this chapter, we examine the packages which have had the largest impact on geospatial analysis and also those that you are likely to frequently encounter. However, as with any filtering of information, you are encouraged to do your own research and draw your own conclusions. The following websites offer more information on software not included in this chapter:

Data access

As described in Chapter 2, Geospatial Data, geospatial datasets are typically large, complex, and varied. This challenge makes libraries, which efficiently read and in some cases, write this data essential to geospatial analysis. These libraries are also the most important. Without access to data, geospatial analysis doesn't begin. Furthermore, accuracy and precision are key factors in geospatial analysis. An image library that resamples data without permission, or a computational geometry library that rounds a coordinate even a couple of decimal places, can adversely affect the quality of analysis. Also, these libraries must manage memory efficiently. A complex geospatial process can last for hours or even days. If a data access library has a memory fault, it can delay an entire project or even an entire workflow involving dozens of people dependent on the output of that analysis.

GDAL

The Geospatial Data Abstraction Library (GDAL) does the most heavy lifting task in the geospatial industry. The GDAL website lists over 80 pieces of software using the library, and this list is by no means complete. Many of these packages are industry leading, open source, and commercial tools. This list doesn't include hundreds of smaller projects and individual analysts using the library for geospatial analysis. GDAL also includes a set of command-line tools that can do a variety of operations without any programming.

Tip

A list of projects using GDAL can be found at the following URL:

http://trac.osgeo.org/gdal/wiki/SoftwareUsingGdal

GDAL provides a single, abstract data model for the vast array of raster data types found in the geospatial industry. It consolidates unique data access libraries for different formats and provides a common API for reading and writing data. Before developer Frank Warmerdam created GDAL in the late 1990s, each data format required a separate data access library with a different API to read data or the worse situation was that developers often wrote custom data access routines.

The following diagram provides a visual description of how GDAL abstracts raster data:

GDAL

In the software concept map earlier in this chapter, you can see that GDAL has had the greatest impact of any single piece of geospatial software. Combine GDAL with its sister library OGR for vector data and the impact almost doubles. The PROJ.4 library has also had tremendous impact, but it is usually accessed via OGR or GDAL.

The GDAL homepage can be found at http://www.gdal.org/.

OGR

The OGR Simple Features Library is the vector data companion of GDAL. The OGR lists at least partial support for over 70 vector data formats. OGR originally stood for OpenGIS Simple Features Reference Implementation; however, it did not evolve into a reference implementation for the Simple Features standard even though the name stuck.

OGR serves the same purpose for vector data as GDAL does for raster data. It is also almost equally prolific in the geospatial industry. A part of the success of the GDAL/OGR package is the X11/MIT open source license. This license is both commercial and open source friendly. The GDAL/OGR library can be included in the proprietary software without revealing proprietary source code to users.

OGR has the following capabilities:

  • Uniform vector data and modeling abstraction
  • Vector data re-projection
  • Vector data format conversion
  • Attribute data filtering
  • Basic geometry filtering including clipping and point-in-polygon testing


Like GDAL, OGR has several command-line utility programs, which demonstrate its capability. This capability can also be accessed through its programming API. The following diagram outlines the OGR architecture:

OGR

The OGR architecture is fairly concise, considering this model is able to represent over 70 different data formats. The Geometry object represents the OGC Simple Features Specification data model for points, linestrings, polygons, geometrycollections, multipolygons, multipoints, and multilinestrings. The Feature Definition object contains the attribute definitions of a group of related features. The Feature object ties the Geometry and Feature Definition information together. The Spatial Reference object contains an OGC Spatial Reference definition. The Layer object represents features grouped as layers within a data source. The Data Source is the file or database object accessed by OGR. The Driver object contains the translators for the 70 plus data formats available to OGR.

This architecture works smoothly with one minor quirk. The Layer concept is used even for data formats that only contain a single layer. For example, shapefiles can only represent a single layer. But, when you access a shapefile using OGR, on opening the data source, you must still invoke a new Layer object using the base name of the shapefile without a file extension. The design feature is only a minor inconvenience heavily outweighed by the power that OGR provides.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset