How to do it...

Let's start with some simple count queries to determine the number of airports (nodes or vertices; remember?) and the number of flights (the edges), which can be determined by applying count(). The call to count() is similar to a DataFrame except that you also need to include whether you are counting vertices or edges:

print "Airport count: %d" % graph.vertices.count()
print "Trips count: %d" % graph.edges.count()

The output of these queries should be similar to the following output, denoting the 279 vertices (that is, airports) and more than 1.3 million edges (that is, flights):

Output:
Airports count: 279
Trips count: 1361141

Similar to DataFrames, you can also execute the filter and groupBy clauses to better understand the number of delayed flights. To understand the number of on-time or early flights, we use the filter where delay <= 0; the delayed flights, on the other hand, show delay > 0:

print "Early or on-time: %d" % graph.edges.filter("delay <= 0").count()
print "Delayed: %d" % graph.edges.filter("delay > 0").count()

# Output
Early or on-time: 780469
Delayed: 580672

Diving further, you can filter for delayed flights (delay > 0) departing from San Francisco (src = 'SFO') grouped by the destination airports, sorting by average delay descending (desc("avg(delay)")):

display(
graph
.edges
.filter("src = 'SFO' and delay > 0")
.groupBy("src", "dst")
.avg("delay")
.sort(desc("avg(delay)"))
)

If you are using the Databricks notebooks, you can visualize the GraphFrame queries. For example, we can determine the destination states with delay > 100 minutes departing from Seattle using the following query:

# States with the longest cumulative delays (with individual delays > 100 minutes) 
# origin: Seattle
display(graph.edges.filter("src = 'SEA' and delay > 100"))

The preceding code produces the following map. The darker the blue hue, the more the delay that the flights experienced. From the following graph, you can see that most of the delayed flights departing Seattle have their destination within the state of California:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset