To easily understand the complex relationship of city airports and the flights between each other, we can use motifs
to find patterns of airports (for example, vertices) connected by flights (that is, edges). The result is a DataFrame in which the column names are given by the motif keys. Note that motif finding is one of the new graph algorithms supported as part of GraphFrames.
For example, let's determine the delays that are due to San Francisco International Airport (SFO):
# Generate motifs motifs = tripGraphPrime.find("(a)-[ab]->(b); (b)-[bc]->(c)") .filter("(b.id = 'SFO') and (ab.delay > 500 or bc.delay > 500) and bc.tripid > ab.tripid and bc.tripid < ab.tripid + 10000") # Display motifs display(motifs)
Breaking down the preceding query, the (x)
represents the vertex (that is, airport) while the [xy]
represents the edge (that is, flights between airports). Therefore, to determine the delays that are due to SFO, use the following:
(b)
represents the airport in the middle (that is, SFO)(a)
represents the origin airport (within the dataset)(c)
represents the destination airport (within the dataset)[ab]
represents the flight between (a)
(that is, origin) and (b)
(that is, SFO)[bc]
represents the flight between (b)
(that is, SFO) and (c)
(that is, destination)Within the filter
statement, we put in some rudimentary constraints (note that this is an over simplistic representation of flight paths):
b.id = 'SFO'
denotes that the middle vertex (b)
is limited to just SFO airport(ab.delay > 500 or bc.delay > 500)
denotes that we are limited to flights that have delays greater than 500 minutes(bc.tripid > ab.tripid and bc.tripid < ab.tripid + 10000)
denotes that the (ab)
flight must be before the (bc)
trip and within the same day. The tripid
was derived from the date time, thus explaining why it could be simplified this wayThe output of this query is noted in the following figure:
The following is a simplified abridged subset from this query where the columns are the respective motif keys:
a |
ab |
b |
bc |
c |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Referring to the TUS > SFO > JFK flight, you will notice that while the flight from Tuscon to San Francisco departed 5 minutes early, the flight from San Francisco to New York JFK was delayed by 536 minutes.
By using motif finding, you can easily search for structural patterns in your graph; by using GraphFrames, you are using the power and speed of DataFrames to distribute and perform your query.