How to do it...

If you are running your job from a Spark CLI (for example, spark-shellpysparkspark-sql, or spark-submit), you can use the –-packages command, which will extract, compile, and execute the necessary code for you to use the GraphFrames package.

For example, to use the latest GraphFrames package (which, at the time of writing this book, is version 0.5) with Spark 2.1 and Scala 2.11 with spark-shell, the command is:

$SPARK_HOME/bin/pyspark --packages graphframes:graphframes:0.5.0-spark2.3-s_2.11

However, in order to use GraphFrames with Spark 2.3, you need to build the package from sources.

Check out the steps outlined here: https://github.com/graphframes/graphframes/issues/267.

If you are using a service such as Databricks, you will need to create a library with GraphFrames. For more information, please refer to how to create a library in Databricks at https://docs.databricks.com/user-guide/libraries.html, and how to install a GraphFrames Spark package at https://cdn2.hubspot.net/hubfs/438089/notebooks/help/Setup_graphframes_package.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset