Least cost path analysis

Calculating driving directions is the most commonly used geospatial function in the world. Typically, these algorithms calculate the shortest path between points A and B, or they may take into account the speed limit of the road or even current traffic conditions to choose a route by drive time.

But what if your job is to build a new road? Or what if you are in charge of deciding where to run power transmission lines or water lines across a remote area? In a terrain-based setting, the shortest path might be to cross a difficult mountain or run through a lake. In this case, we need to account for obstacles and avoid them if possible. However, if avoiding a minor obstacle takes us too far out of our way, the cost of implementing that route may be more expensive than just going over a mountain.

This type of advanced analysis is called Least cost path analysis. We search an area for the route that is the best compromise of distance versus the cost of following the route. The algorithm we use for this process is called the A-star or A* algorithm. The oldest routing method is called the Dijkstra's algorithm, which calculates the shortest path in a network such as a road network. The A* method can do that as well, but it is also better suited for traversing a grid like a DEM. You can find out more about these algorithms on the following web pages:

This example is the most complex in this chapter. To better understand it, we have a simple version of the program, which is text-based and operates on a 5x5 grid with randomly generated values. You can actually see how this program follows the algorithm before trying it on an elevation grid with thousands of values.

This program executes the following steps:

  1. Create a simple grid with randomly-generated pseudo-elevation values between 1 and 16.
  2. Define a start location in the lower-left corner of the grid.
  3. Define the end point as the upper-right corner of the grid.
  4. Create a cost grid that has the elevation of each cell in addition to the cell's distance to the finish.
  5. Examine each neighboring cell from the start and choose the one with the lowest cost.
  6. Repeat the evaluation using the chosen cell until we get to the end.
  7. Return the set of chosen cells as the least-cost path.

Setting up the test grid

You simply run this program from the command line and view its output. The first section of this script sets up our artificial terrain grid as a randomly-generated NumPy array with notional elevation values between 1 and 16. We also create a distance grid, which calculates the distance between each cell to the destination cell. This value is the cost of each cell, as you can see here:

import numpy as np

# Width and height
# of grids
w = 5
h = 5

# Start location:
# Lower left of grid
start = (h-1, 0)

# End location:
# Top right of grid
dx = w-1
dy = 0

# Blank grid
blank = np.zeros((w, h))

# Distance grid
dist = np.zeros(blank.shape, dtype=np.int8)

# Calculate distance for all cells
for y, x in np.ndindex(blank.shape):
    dist[y][x] = abs((dx-x)+(dy-y))

# "Terrain" is a random value between 1-16.
# Add to the distance grid to calculate
# The cost of moving to a cell
cost = np.random.randint(1, 16, (w, h)) + dist

print("COST GRID (Value + Distance)")
print(cost)
print()

The simple A* algorithm

The A* search algorithm implemented here crawls the grid in a fashion similar to our flood fill algorithm in the previous example. Once again we use sets to avoid using recursion and to avoid duplication of cell checks. But this time, instead of checking elevation, we check the distance cost of routing through a cell in question. If the move raises the cost of getting to the end, we go with a lower-cost option, as you can see in the following script:

# Our A* search algorithm
def astar(start, end, h, g):
    closed_set = set()
    open_set = set()
    path = set()
    open_set.add(start)
    while open_set:
        cur = open_set.pop()
        if cur == end:
            return path
        closed_set.add(cur)
        path.add(cur)
        options = []
        y1 = cur[0]
        x1 = cur[1]
        if y1 > 0:
            options.append((y1-1, x1))
        if y1 < h.shape[0]-1:
            options.append((y1+1, x1))
        if x1 > 0:
            options.append((y1, x1-1))
        if x1 < h.shape[1]-1:
            options.append((y1, x1+1))
        if end in options:
            return path
        best = options[0]
        cset.add(options[0])
        for i in range(1, len(options)):
            option = options[i]
            if option in closed_set:
                continue
            elif h[option] <= h[best]:
                best = option
                closed_set.add(option)
            elif g[option] < g[best]:
                best = option
                closed_set.add(option)
            else:
                closed_set.add(option)
        print(best, ", ", h[best], ", ", g[best])
        open_set.add(best)
    return []

Generating the test path

Finally, we output the least cost path on a grid. The path is represented by 1 values and all other cells as 0 values, as shown in the following lines of command:

print("(Y, X), HEURISTIC, DISTANCE")

# Find the path
path = astar(start, (dy, dx), cost, dist)
print()

# Create and populate the path grid
path_grid = np.zeros(cost.shape, dtype=np.uint8)
for y, x in path:
    path_grid[y][x] = 1
path_grid[dy][dx] = 1

print("PATH GRID: 1=path")
print(path_grid)

Viewing the test output

When you run this program, you get the following sample output (however your's will be different as it is constructed randomly):

COST GRID (Value + Distance)
[[13 10  5 15  9]
 [15 13 16  5 16]
 [17  8  9  9 17]
 [ 4  1 11  6 12]
 [ 2  7  7 11  8]]

(Y,X), HEURISTIC, DISTANCE
(3, 0) ,  4 ,  1
(3, 1) ,  1 ,  0
(2, 1) ,  8 ,  1
(2, 2) ,  9 ,  0
(2, 3) ,  9 ,  1
(1, 3) ,  5 ,  0
(0, 3) ,  15 ,  1

PATH GRID: 1=path
[[0 0 0 1 1]
 [0 0 0 1 0]
 [0 1 1 1 0]
 [1 1 0 0 0]
 [1 0 0 0 0]]

The grid is small enough such that you can easily trace the algorithm's steps manually. This implementation uses Manhattan distance, which means the distance does not use diagonal lines but only uses left, right, up, and down measurements. The search also does not move diagonally to keep things simple.

The real-world example

Now that we have a basic understanding of the A* algorithm, let's move to a more complex example. We'll use the same DEM located near Vancouver, British Columbia Canada that we used in Creating a shaded relief section of Chapter 7, Python and Elevation Data, for the relief example. The spatial reference for this grid is EPSG:26910 - NAD83 / UTM zone 10N. You can download the DEM, relief, as well as start and end points of the shapefile as a zipped package from the following URL:

http://git.io/v3fpL

We'll actually use the shaded relief for visualization. Our goal in this exercise will be to move from the start to the finish point in the lowest-cost way possible.

The real-world example

Just looking at the terrain, you can find that there are two paths that follow low-elevation routes without much change in direction. These two routes are illustrated in the following screenshot:

The real-world example

So we would expect that when we used the A* algorithm, it would be close. Keep in mind that the algorithm is only looking in the immediate vicinity, so it can't look at the whole image like we can and make adjustments early in the route based on a known obstacle ahead.

We will expand this implementation from our simple example and use Euclidian Distance or as the crow flies measurements; we'll also allow the search to look in eight directions instead of four. We will prioritize terrain as the primary decision point. We will use distance, both to the finish and from the start, as lower priorities to make sure we are moving forward towards the goal and not getting too far off track. Other than those differences, the steps are identical to the simple example. The output will be a raster with the path values set to 1 and the other values set to 0.

Loading the grid

The script starts out simply enough. We load the grid into a NumPy array from an ASCII Grid. We name our output path grid and then we define the starting cell and end cell, as you can see here:

import numpy as np
import math
from linecache import getline

# Our terrain data
source = "dem.asc"

# Output file name
# for the path raster
target = "path.asc"

print("Opening %s..." % source)
cost = np.loadtxt(source, skiprows=6)
print("Opened %s." % source)

# Parse the header
hdr = [getline(source, i) for i in range(1, 7)]
values = [float(ln.split(" ")[-1].strip()) for ln in hdr]
cols, rows, lx, ly, cell, nd = values

# Starting column, row
sx = 1006
sy = 954

# Ending column, row
dx = 303
dy = 109

Defining the helper functions

We need three functions to route over terrain. One is the A* algorithm and the other two assist the algorithm in choosing the next step. We'll briefly discuss these helper functions. First, we have a simple Euclidian Distance function named e_dist, which returns the straight-line distance between two points as map units. Next, we have an important function called weighted_score, which returns a score for a neighboring cell based on the elevation change between the neighbor and the current cell as well as the distance to the destination.

This function is better than distance or elevation alone because it reduces the chance of there being a tie between two cells, thus making it easier to avoid back-tracking. This scoring formula is loosely based on a concept called the Nilsson Score, commonly used in these types of algorithms and referenced in the Wikipedia articles mentioned earlier in this chapter. What's great about this function is that it can score the neighboring cell with any values you wish. You might also use a real-time feed to look at the current weather in the neighboring cell and avoid cells with rain or snow, as shown here:

def e_dist(p1, p2):
    """
    Takes two points and returns
    the euclidian distance
    """
    x1, y1 = p1
    x2, y2 = p2
    distance = math.sqrt((x1-x2)**2+(y1-y2)**2)
    return int(distance)


def weighted_score(cur, node, h, start, end):
    """
    Provides a weighted score by comparing the
    current node with a neighboring node. Loosely
    based on on the Nisson score concept: f=g+h
    In this case, the "h" value, or "heuristic",
    is the elevation value of each node.
    """
    score = 0
    # current node elevation
    cur_h = h[cur]
    # current node distance from end
    cur_g = e_dist(cur, end)
    # current node distance from
    cur_d = e_dist(cur, start)
    # neighbor node elevation
    node_h = h[node]
    # neighbor node distance from end
    node_g = e_dist(node, end)
    # neighbor node distance from start
    node_d = e_dist(node, start)
    # Compare values with the highest
    # weight given to terrain followed
    # by progress towards the goal.
    if node_h < cur_h:
        score += cur_h-node_h
    if node_g < cur_g:
        score += 10
    if node_d > cur_d:
        score += 10
    return score

The real-world A* algorithm

This algorithm is more involved than the simple version in our previous example. We use sets to avoid redundancy. It also implements our more advanced scoring algorithm and checks to make sure we aren't at the end of the path before performing additional calculations. Unlike our last example, this more advanced version also checks cells in eight directions, so the path can move diagonally. There is a print statement at the end of this function that is commented out. You can uncomment it to watch the search crawl through the grid, as shown in the following script:

def astar(start, end, h):
    """
    A-Star (or A*) search algorithm.
    Moves through nodes in a network
    (or grid), scores each node's
    neighbors, and goes to the node
    with the best score until it finds
    the end.  A* is an evolved Dijkstra
    algorithm.
    """
    # Closed set of nodes to avoid
    closed_set = set()
    # Open set of nodes to evaluate
    open_set = set()
    # Output set of path nodes
    path = set()
    # Add the starting point to
    # to begin processing
    open_set.add(start)
    while open_set:
        # Grab the next node
        cur = open_set.pop()
        # Return if we're at the end
        if cur == end:
            return path
        # Close off this node to future
        # processing
        closed_set.add(cur)
        # The current node is always
        # a path node by definition
        path.add(cur)
        # List to hold neighboring
        # nodes for processing
        options = []
        # Grab all of the neighbors
        y1 = cur[0]
        x1 = cur[1]
        if y1 > 0:
            options.append((y1-1, x1))
        if y1 < h.shape[0]-1:
            options.append((y1+1, x1))
        if x1 > 0:
            options.append((y1, x1-1))
        if x1 < h.shape[1]-1:
            options.append((y1, x1+1))
        if x1 > 0 and y1 > 0:
            options.append((y1-1, x1-1))
        if y1 < h.shape[0]-1 and x1 < h.shape[1]-1:
            options.append((y1+1, x1+1))
        if y1 < h.shape[0]-1 and x1 > 0:
            options.append((y1+1, x1-1))
        if y1 > 0 and x1 < h.shape[1]-1:
            options.append((y1-1, x1+1))
        # If the end is a neighbor, return
        if end in options:
            return path
        # Store the best known node
        best = options[0]
        # Begin scoring neighbors
        best_score = weighted_score(cur, best, h, start, end)
        # process the other 7 neighbors
        for i in range(1, len(options)):
            option = options[i]
            # Make sure the node is new
            if option in closed_set:
                continue
            else:
            # Score the option and compare it to the best known
                option_score = weighted_score(cur, option, h, start, end)
                if option_score > best_score:
                    best = option
                    best_score = option_score
                else:
                    # If the node isn't better seal it off
                    closed_set.add(option)
                # Uncomment this print statement to watch
                # the path develop in real time:
                # print(best, e_dist(best, end))
        # Add the best node to the open set
        open_set.add(best)
    return []

Generating a real-world path

Finally, we create our real-world path as a chain of ones in a grid of zeros. This raster can then be brought into an application such as QGIS and visualized over the terrain grid, as shown in the following lines of code:

print("Searching for path...")
p = astar((sy, sx), (dy, dx), cost)
print("Path found.")
print("Creating path grid...")
path = np.zeros(cost.shape)
print("Plotting path...")
for y, x in p:
    path[y][x] = 1
path[dy][dx] = 1
print("Path plotted.")

print("Saving %s..." % target)
header = ""
for i in range(6):
    header += hdr[i]

# Open the output file, add the hdr, save the array
with open(target, "wb") as f:
    f.write(bytes(header, 'UTF-8'))
    np.savetxt(f, path, fmt="%4i")

print("Done!")

Here is the output route of our search:

Generating a real-world path

As you can see, the A* search came very close to one of our manually selected routes. In a couple of cases, the algorithm chose to tackle some terrain instead of trying to go around it. Sometimes the slight terrain is deemed less of a cost than the distance to go around it. You can see examples of that choice in this zoomed-in portion of the upper-right section of the route:

Generating a real-world path

We only used two values: terrain and distance. But you could also add hundreds of factors such as soil type, water bodies, and existing roads. All of these items could serve as an impedance or an outright wall. You would just modify the scoring function in the example to account for any additional factors. Keep in mind that the more factors you add, the more difficult it is to trace what the A* implementation was thinking when it chose the route.

An obvious future direction for this analysis would be to create a vector version of this route as a line. The process would include mapping each cell to a point and then using the nearest neighbor analysis to order the points properly before saving as a shapefile or GeoJSON.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset