What you need for this book

This book is based on open source software. First, it's Java. One can download Java from Oracle's Java Download page. You have to accept the license and choose an appropriate image for your platform. Don't use OpenJDK—it has a few problems with Hadoop/Spark.

Second, Scala. If you are using Mac, I recommend installing Homebrew:

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Multiple open source packages will also be available to you. To install Scala, run brew install scala. Installation on a Linux platform requires downloading an appropriate Debian or RPM package from the http://www.scala-lang.org/download/ site. We will use the latest version at the time, that is, 2.11.7.

Spark distributions can be downloaded from http://spark.apache.org/downloads.html. We use pre-build for Hadoop 2.6 and later image. As it's Java, you need to just unzip the package and start using the scripts from the bin subdirectory.

R and Python packages are available at http://cran.r-project.org/bin and http://python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz sites respectively. The text has specific instruction on how to configure them. Although our use of the packages should be version agnostic, I used R version 3.2.3 and Python version 2.7.11 in this book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset