How it works...

As we're using a Databricks notebook, even though its default language is Python, we can specify Scala by using %scala as the first line within a cell. The first code snippet refers to package d3a, which specifies the JavaScript calls that define our airport visualization. As you dive into the code, you'll notice that this is a force-directed graph (def force) visualization that shows a graph (show graph) that builds up the map of the US and location of the airports (blue bubbles).

The force function has the following definition:

def force(clicks: Dataset[Edge], height: Int = 100, width: Int = 960): Unit = {
...
showGraph(height, width, Seq(Graph(nodes, links)).toDF().toJSON.first())
}

Recall that we call this function in the next cell using the following code snippet:

%scala
// On-time and Early Arrivals
import d3a._
graphs.force(
height = 800,
width = 1200,
clicks = sql("""select src, dst as dest, count(1) as count from deptsDelays_GEO where delay <= 0 group by src, dst""").as[Edge])

The height and width are readily apparent, but the key call out is that we use a Spark SQL query against the deptsDelays_GEO DataFrame to define the edges (that is, the source and destination IATA codes). As the IATA codes are already defined within the calls within showGraph, we already have the vertices of our visualization. Note that as we had already created the DataFrame deptsDelays_GEO, even though it was created using PySpark, it is accessible by Scala within the same Databricks notebook.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset