How it works...

You can install a package such as GraphFrames by building it off the GraphFrames GitHub repository at https://github.com/graphframes/graphframes, but an easier way is to utilize the GraphFrames Spark package which is available at https://spark-packages.org/package/graphframes/graphframes. Spark Packages is a repository that contains an index of third-party packages for Apache Spark. By using Spark packages, PySpark will download the latest version of the GraphFrames Spark package, compile it, and then execute it within the context of your Spark job.

When you include the GraphFrames package using the following command, notice the call graphframes console output, denoting that the package is being pulled in from the spark-packages repository for compilation:

$ ./bin/pyspark --master local --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
...
graphframes#graphframes added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found graphframes#graphframes;0.5.0-spark2.1-s_2.11 in spark-packages
found com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 in central
found com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 in central
found org.scala-lang#scala-reflect;2.11.0 in central
found org.slf4j#slf4j-api;1.7.7 in central
downloading http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar ...
[SUCCESSFUL ] graphframes#graphframes;0.5.0-spark2.1-s_2.11!graphframes.jar (600ms)
:: resolution report :: resolve 1503ms :: artifacts dl 608ms
:: modules in use:
com.typesafe.scala-logging#scala-logging-api_2.11;2.1.2 from central in [default]
com.typesafe.scala-logging#scala-logging-slf4j_2.11;2.1.2 from central in [default]
graphframes#graphframes;0.5.0-spark2.1-s_2.11 from spark-packages in [default]
org.scala-lang#scala-reflect;2.11.0 from central in [default]
org.slf4j#slf4j-api;1.7.7 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|

---------------------------------------------------------------------
| default | 5 | 1 | 1 | 0 || 5 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
1 artifacts copied, 4 already retrieved (323kB/9ms)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset