How to do it...

There are just truly two main requirements for installing PySpark: Java and Python. Additionally, you can also install Scala and R if you want to use those languages, and we will also check for Maven, which we will use to compile the Spark sources. 

To do this, we will use the checkRequirements.sh script to check for all the requirements: the script is located in the Chapter01 folder from the GitHub repository.

The following code block shows the high-level portions of the script found in the Chapter01/checkRequirements.sh file. Note that some portions of the code were omitted here for brevity:

#!/bin/bash

# Shell script for checking the dependencies
#
# PySpark Cookbook
# Author: Tomasz Drabas, Denny Lee
# Version: 0.1
# Date: 12/2/2017

_java_required=1.8
_python_required=3.4
_r_required=3.1
_scala_required=2.11
_mvn_required=3.3.9

# parse command line arguments
_args_len="$#"
...

printHeader
checkJava
checkPython

if [ "${_check_R_req}" = "true" ]; then
checkR
fi

if [ "${_check_Scala_req}" = "true" ]; then
checkScala
fi

if [ "${_check_Maven_req}" = "true" ]; then
checkMaven
fi
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset