Computational geometry

Computational geometry encompasses the algorithms needed to perform operations on vector data. The field is very old in computer science; however, most of the libraries used for geospatial operations are separate from computer graphics libraries because of geospatial coordinate systems. As described near the end of Chapter 1, Learning Geospatial Analysis with Python, computer screen coordinates are almost always expressed in positive numbers, while geospatial coordinate systems often use negative numbers when they're moving west and south.

Several different geospatial libraries fit into the category but serve a wide range of uses from spatial selection to rendering. It should be noted that some features of the OGR Simple Features Library described previously move it beyond the category of data access and into the realm of computational geometry. But, it was included in the prior category because that is its primary purpose.

Computational geometry is a fascinating subject. While writing a simple script to automate a geospatial operation, you inevitably need some spatial algorithm. The question then arises that "Do you try to implement this algorithm yourself or go through the overhead of using a third-party library?" The choice is always deceptive because some tasks are visually easy to understand and implement; some look complex but turn out to be easy, and some are trivial to comprehend but are extraordinarily difficult. One such example is a geospatial buffer operation. The concept is easy enough, but the algorithm turns out to be quite difficult. The following libraries in this section are the major packages for computational geometry algorithms.

The PROJ.4 projection library

U.S. Geological Survey (USGS) analyst Jerry Evenden created what is now the PROJ.4 projection library in the mid 1990s while working at the USGS. Since then, it has become a project of the Open Source Geospatial Foundation with contributions from many other developers. PROJ.4 accomplishes the herculean task of transforming data among thousands of coordinate systems. The math to convert points among so many coordinate systems is extremely complex. No other library comes close to the capability PROJ.4. That fact and the routine needed by applications to convert data sets from different sources to a common projection make PROJ.4 the undisputed leader in this area.

The following plot is an example of how specific the projections supported by PROJ.4 can be. This image from OSGeo.org represents the Line/Station coordinate system of the California Cooperative Oceanic Fisheries Investigations (CalCOFI) program pseudo-projection used only by NOAA, the University of California Scripps Institution of Oceanography and the California Department of Fish and Wildlife to collect oceanographic and fisheries data over the last 60 years along the California coastline:

The PROJ.4 projection library

PROJ.4 uses a simple syntax capable of describing any projection including custom, localized ones as shown in the previous diagram. PROJ.4 can be found in virtually every major GIS package, which provides reprojection support and has its own command-line tools as well. It is available through both GDAL and OGR for vector and raster data. However, it is often useful to access the library directly because it gives you the ability to reproject individual points. Most of the libraries that incorporate PROJ.4 only let you reproject entire data sets.

For more information on PROJ.4 visit https://github.com/OSGeo/proj.4/wiki.

CGAL

The Computational Geometry Algorithms Library (CGAL), originally released in the late 1990s, is a robust and well-established open source computational geometry library. It is not specifically designed for geospatial analysis but is commonly used in the field.

CGAL is often referenced as a source for reliable geometry processing algorithms. The following image from the CGAL User and Reference Manual provides a visualization of one of the often referenced algorithms from CGAL called a polygon straight skeleton needed to accurately grow or shrink a polygon:

CGAL

The straight skeleton algorithm is complex and important because shrinking or growing a polygon isn't just a matter of making it bigger or smaller. The polygon actually changes shape. As a polygon shrinks, non-adjacent edges collide and eliminate connecting edges. As a polygon grows, adjacent edges separate and new edges are formed to connect them. This process is key to buffering geospatial polygons. The following image, also from the CGAL User and Reference Manual, shows this effect using insets on the preceding polygon:

CGAL

CGAL can be found online at http://www.cgal.org/.

JTS

The Java Topology Suite (JTS) is a geospatial computational geometry library written in 100 percent pure Java. JTS separates itself from other computational geometry libraries by implementing the Open Geospatial Consortium (OGC) Simple Features Specification for SQL. Interestingly, other developers have ported JTS to other languages including C++, Microsoft .NET, and even JavaScript.

JTS includes a fantastic test program called the JTS Test Builder, which provides a GUI to test functions without setting up an entire program. One of the most frustrating aspects of geospatial analysis concerns bizarre geometry shapes that break algorithms, which work most of the time. Another common issue is unexpected results due to tiny errors in data such as polygons that intersect themselves in very small areas that are not easily visible. The JTS Test Builder lets you interactively test JTS algorithms to verify data or just visually understand a process, as shown here:

JTS

This tool is handy even if you aren't using JTS but one of the several ports to another language. It should be noted that Vivid Solutions, the maintainer of JTS, hasn't released a new version since JTS Version 1.8 in December 2006. The package is quite stable and still in active use. The JTS homepage is available at http://www.vividsolutions.com/jts/JTSHome.htm.

GEOS

GEOS, which stands for Geometry Engine – Open Source, is the C++ port of the JTS library explained previously. It is mentioned here because this port has had a much larger impact on the geospatial analysis than the original JTS. The C++ version can be compiled on many platforms as it avoids any platform-specific dependencies. Another factor responsible for the popularity of GEOS is that a fair amount of infrastructure exists to create automated or semi-automated bindings to various scripting languages including Python. One more factor is that the majority of geospatial analysis software is written in C or C++. The most common use of GEOS is through other APIs, which include this.

GEOS provides the following capabilities:

  • OGC Simple Features
  • Geospatial predicate functions
  • Intersects
  • Touches
  • Disjoint
  • Crosses
  • Within
  • Contains
  • Overlaps
  • Equals
  • Covers
  • Geospatial operations
  • Union
  • Distance
  • Intersection
  • Symmetric difference
  • Convex hull
  • Envelope
  • Buffer
  • Simplify
  • Polygon assembly
  • Polygon validation
  • Area
  • Length
  • Spatial indexing
  • OGC well-known text (WKT) and well-known binary (WKB) input/output
  • C and C++ API
  • Thread safety

GEOS can be compiled with GDAL to give OGR all of its capability. GEOS can be found online at https://trac.osgeo.org/geos/.

PostGIS

As far as open source geospatial databases go, PostGIS is the most commonly used spatial database. PostGIS is essentially a module on top of the well-known PostgreSQL relational database. Much of the power of PostGIS comes from the GEOS library mentioned earlier. Like JTS, it also implements the OGC Simple Features Specification for SQL. The combination of computational geometry ability in a geospatial context sets PostGIS in a category on its own.

PostGIS allows you to execute both attribute and spatial queries against a dataset. Recall the point from Chapter 2, Geospatial Data, that a typical spatial data set is comprised of multiple data types including geometry, attributes (one or more columns of data in a row), and in most cases, indexing data. In PostGIS, you can query attribute data as you would do to any database table using SQL. This capability is not surprising as attribute data is stored in a traditional database structure. However, you can also query geometry using SQL syntax. Spatial operations are available through SQL functions, which you include as part of queries. The following sample PostGIS SQL statement creates a 14.5 kilometer buffer around the state of Florida:

SELECT ST_Buffer(the_geom, 14500)
FROM usa_states
WHERE state = 'Florida'

The FROM clause designates the usa_states layer as the location of the query. We filter that layer by isolating Florida in the WHERE clause. Florida is a value in the column state of the usa_states layer. The SELECT clause performs the actual spatial selection on the geometry of Florida normally contained in the column the_geom using the PostGIS ST_Buffer() function. The column the_geom is the geometry column for the PostGIS layer in this instance. The ST in the function name stands for Spatial Type. The ST_Buffer() function accepts a column containing spatial geometries and a distance in the map units of the underlying layer. The map units in usa_states layer are expressed in meters, so 14.5 km would be 14,500 meters in the preceding example. Recall the point from Chapter 1, Learning Geospatial Analysis with Python that buffers like this query are used for proximity analysis. It just so happens that the State of Florida water boundary expands 9 nautical miles or approximately 14.5 km into the Gulf of Mexico from the state's western and northwestern coastlines.

The following image shows the official Florida state water boundary as a dotted line, which is labeled on the map:

PostGIS

After applying the 9 nautical mile buffer, you can see in the following image that the result, highlighted in orange, is quite close to the official legal boundary based on detailed ground surveys:

PostGIS

Tip

The website GISTutor.com has an excellent interactive online tutorial, which allows you to execute spatial queries against a continental U.S. map and see the result immediately on a web map. A solid written introductory PostGIS tutorial can be found at the following URL:

http://workshops.boundlessgeo.com/postgis-intro/

Currently, PostGIS maintains the following feature set:

  • Geospatial geometry types including points, linestrings, polygons, multipoints, multilinestrings, multipolygons, and geometry collections, which can store different types of geometries including other collectionsSpatial functions for testing geometric relationships (for example, point-in-polygon or unions)
  • Spatial functions for deriving new geometries (for example, buffers and intersects)
  • Spatial measurements including perimeter, length, and area
  • Spatial indexing using an R-Tree algorithm
  • A basic geospatial raster data type
  • Topology data types
  • US Geocoder based on TIGER census data
  • A new JSONB data type which allows indexing and querying of JSON and GeoJSON

The PostGIS feature set is competitive among all geodatabases and the most extensive among any open source or free geodatabases. The active momentum of the PostGIS development community is another reason of this system being the best of breed. PostGIS is maintained at http://postgis.net.

Other spatially-enabled databases

PostGIS is the gold standard among free and open source geospatial databases. However, there are several other systems that you should be aware of as a geospatial analyst. This list includes both commercial and open source systems with varying degrees of geospatial support.

Geodatabases have evolved in parallel to geospatial software, standards, and the Web. The Internet has driven the need for large, multiuser geospatial database servers able to serve large amounts of data. The following image, courtesy of www.OSGeo.org, shows how geospatial architectures have evolved with a significant portion of this evolution happening at the database level:

Other spatially-enabled databases

Oracle spatial and graph

The Oracle relational database is a widely used database system typically used by very large organizations because of its cost and large scalability. It is also extremely stable and fast. It runs some of the largest and most complicated databases in the world. It is often found in hospitals, banks, and government agencies managing millions of critical records.

Geospatial data capability first appeared at Oracle Version 4 as a modification by the Canadian Hydrographic Service (CHS). CHS also implemented Oracle's first spatial index in the form of an unusual but efficient three-dimensional helical spiral. Oracle subsequently incorporated the modification and released the Oracle Spatial Database Option (SDO) at Version 7 of the main database. The SDO system became Oracle Spatial at Oracle Version 8. The database schema of Oracle Spatial still has the SDO prefix on some column and table names similar to how PostGIS uses the OGC convention ST (spatial type) to separate spatial information from traditional relational database tables and functions at the schema level.

As of 2012, Oracle began calling the package Oracle Spatial and Graph to emphasize the network data module. This module is used for analyzing networked datasets such as transportation or utilities. However, the module can also be used against abstract networks such as social networks. The analysis of social network data is a common target for big data analysis, which is now a growing trend. Big data social network analysis is likely the reason Oracle changed the name of the product.

As a spatial, Oracle Spatial has the following capabilities:

  • A geospatial data schema
  • A spatial indexing system which is now based on an R-tree index
  • An SQL API for performing geometric operations
  • A spatial data tuning API to optimize a particular dataset
  • A topology data model
  • A network data model
  • A GeoRaster data type to store, index, query, and retrieve raster data
  • Three-dimensional data types including Triangulated Irregular Networks (TINs) and LIDAR point clouds
  • A geocoder to search location names and return coordinates
  • A routing engine for driving direction-type queries
  • Open Geospatial Consortium-compliance

Oracle Spatial and PostGIS are reasonably comparable and are both commonly used. You will see these two systems sooner or later as data sources while performing geospatial analysis.

Tip

Oracle Spatial and Graph is sold separately from Oracle itself. A little-known fact is that the SDO data type is native to the main Oracle database. If you have a simple application which just inputs points and retrieves them, you can use the main Oracle API to add, update, and retrieve SDOs without Oracle Spatial and Graph.

The U.S. Bureau of Ocean Energy, Management, Regulation, and Enforcement (BOEMRE) uses Oracle to manage environmental, business, and geospatial data for billions of dollars' worth of oil, gas, and mineral rights in one of the largest geospatial systems in the world. The following map is courtesy of BOEMRE:

Oracle spatial and graph

Oracle Spatial and Graph can be found online at the following URL: http://www.oracle.com/us/products/database/options/spatial/overview.

ArcSDE

ArcSDE is Esri's Spatial Data Engine (SDE). It is now rolled into Esri's ArcGIS Server product after over a decade of being a standalone product. What makes ArcSDE interesting is that the engine is mostly database independent supporting multiple database backends. ArcSDE supports IBM DB2, Informix, Microsoft SQL Server, Oracle and PostgreSQL as data storage systems. While ArcSDE has the ability to create and manage a spatial schema from scratch on systems such as Microsoft SQL Server and Oracle, it uses native spatial engines if available. This arrangement is the case for IBM DB2, Oracle, and PostGreSQL. For Oracle, ArcSDE manages the table structure but can rely on the Oracle SDO data type for feature storage.

Like the previously mentioned geodatabases, ArcSDE also has a rich spatial selection API and can handle raster data. However, ArcSDE does not have as rich a SQL spatial API as Oracle and PostGIS. Esri technically supports basic SQL functionality related to ArcSDE, but it encourages users and developers to use Esri software or programming APIs to manipulate data stored through ArcSDE as it is designed to be a datasource for Esri software. Esri does provide software libraries for developers to build applications outside of Esri software using ArcSDE or Esri's file-based geodatabase called a personal geodatabase. But, these libraries are black boxes and the communication protocol ArcSDE uses has never been reverse engineered. Typically, interaction happens between ArcSDE and third-party applications at the web services level using the ArcGIS Server API, which supports OGC services to some degree and a fairly straightforward REST API service that returns GeoJSON.

The following screenshot is of the U.S. federal site http://catalog.data.gov/dataset, a very large geospatial data catalog based on ArcSDE, which in turn networks U.S. federal data including other ArcSDE installations from other federal agencies:

ArcSDE

ArcSDE is integrated into ArcGIS Server; however, information on it remains at http://www.esri.com/software/arcgis/arcsde.

Microsoft SQL Server

Microsoft added spatial data support to its flagship database product in Microsoft SQL Server 2008. It has gradually improved since that version, but still is nowhere near as sophisticated as Oracle Spatial or PostGIS. Microsoft supports the same data types as PostGIS with slightly different naming conventions with the exception of rasters, which are not directly supported. It also supports output to WKT and WKB formats.

It offers some very basic support for spatial selection, but it is obviously not a priority for Microsoft at the moment. This limited support is likely the case because it is all that can be used for Microsoft software mapping components and several third party engines can provide spatial support on top of SQL Server.

Microsoft's support for spatial data in SQL Server is documented at the following link:

http://msdn.microsoft.com/en-us/library/bb933790.aspx

MySQL

MySQL, another highly popular free database, provides nearly the same support as Microsoft SQL Server. The OGC Geometry types are supported with basic spatial relationship functions. Through a series of buyouts, MySQL has become the property of Oracle. While Oracle currently remains committed to MySQL as an open source database, this purchase has brought the ultimate future of the world's most popular open source database into question. But as far as geospatial analysis is concerned, MySQL is barely a contender and unlikely to be the first choice for any project.

For more information on MySQL spatial support visit the following link:

http://dev.mysql.com/doc/refman/5.6/en/spatial-extensions.html

SpatiaLite

SpatiaLite is an extension for the open source SQLite database engine. SQLite uses a file database and is designed to be integrated into applications instead of the typical client server model used by most relational database servers. SQLite has spatial data types and spatial indexing already, but SpatiaLite adds support for the OGC Simple Features Specification as well as map projections.

SpatiaLite can be found at http://www.gaia-gis.it/gaia-sins/.

Routing

Routing is a very niche area of computational geometry. It is also a very rich field of study that goes far beyond the familiar driving directions use case. The requirements for a routing algorithm are simply a networked dataset and impedance values that affect the speed of travel on that network. Typically, the dataset is vector-based but raster data can also be used for certain applications. The two major contenders in this area are Esri's Network Analyst and the open source pgRouting engine for PostGIS. The most common routing problem is the most efficient way to visit a number of point locations. This problem is called the travelling salesman problem (TSP). The TSP is one of the most intensely studied problems in computational geometry. It is often considered the benchmark for any routing algorithm. More information on the TSP can be found at http://en.wikipedia.org/wiki/Travelling_salesman_problem.

Esri Network Analyst and Spatial Analyst

Esri's entry into the routing arena, Network Analyst, is a truly generic routing engine, which can tackle most routing applications regardless of context. Spatial Analyst is another Esri extension that is raster focused and can perform least cost path analysis on raster terrain data.

The ArcGIS Network Analyst product page is located on Esri's website at http://www.esri.com/software/arcgis/extensions/networkanalyst.

pgRouting

The pgRouting extension for PostGIS adds routing functionality to the geodatabase. It is oriented towards road networks but can be adapted to work with other types of networked data. The following image shows a driving distance radius calculation output by pgRouting and displayed in QGIS. The points are color-coded from green to red based on proximity to the starting location. As shown in the following diagram, the points are nodes in the network dataset, courtesy of QGIS.org, which in this case are roads:

pgRouting

The pgRouting PostGIS extension is maintained at http://pgrouting.org/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset