This Appendix will show you how to launch H2O-3 and Sparkling Water clusters on your local machine so that you can run the code samples in this book. We will also show you how to launch H2O-3 clusters in the 90-day free trial environment for the H2O AI Cloud. This trial environment includes Enterprise Steam to launch and manage H2O clusters on Kubernetes infrastructure.
Note on Environments
Architecture: As introduced in Chapter 2, Platform Components and Key Concepts, you will use a client environment (with the H2O-3 or Sparkling Water libraries implemented) to run commands against a remote H2O-3 or Sparkling Water architecture distributed across multiple server nodes on a Kubernetes or Hadoop cluster. For small datasets, however, the architecture can be launched locally as a single process on the same machine as the client.
Versions: Functionality and code samples from this book use the following versions: H2O-3 version 3.34.0.7, and Sparkling Water version 3.34.0.7-1-3.2 to run on Spark 3.2. You will set up your environment with the latest (most recent) stable versions, which will allow you to run the same code samples from this book but will also include capabilities in H2O-3 and Sparkling Water that were added after the book was written.
Languages: You can set up your client environment in Python, R, or Java/Scala. We will use Python in this book. Your Python client can be a Jupyter notebook, PyCharm, or other.
Let's learn how to run H2O-3 entirely in your local environment.
This is the easiest method to run H2O-3 and is suitable for the small datasets used in code samples in this book. It launches H2O-3 on your local machine (versus an enterprise cluster environment) and does not involve H2O Enterprise Steam.
First, we will perform a one-time setup of our H2O-3 Python environment.
To set up your H2O-3 Python client, simply install three module dependencies in your Python environment and then the h2o-3 Python module. You must use Python 2.7.x, 3.5.x, 3.6.x, or 3.7.x.
More specifically, do the following:
pip install requests
pip install tabulate
pip install future
pip install h2o
Please refer to http://h2o-release.s3.amazonaws.com/h2o/rel-zumbo/1/index.html (the INSTALL IN PYTHON tab) to install H2O-3 in Conda.
You are now ready to run H2O-3 locally. Let's see how to do that.
To start a local single-node H2O-3 cluster, simply run the following in your Python IDE:
import h2o
h2o.init()
# write h2o-3 code, including code samples in this book
You can now write your H2O-3 code, including all samples from this book. See Chapter 2, Platform Components and Key Concepts, for a Hello World code sample and an explanation of what happens under the surface.
Java Dependency – Only When Running Locally
The H2O-3 cluster (not the Python client) runs on Java. Because you are running the cluster on your local machine here (representing a single-node cluster), you must have Java installed. This is not required when you use your Python client to connect to a remote H2O cluster in your enterprise Kubernetes or Hadoop environment.
Now, let's see how we can set up our environment to write Sparkling Water code on our local machine.
Running Sparkling Water locally is similar to running H2O-3 locally, but with Spark dependencies. See this link for a full explanation of the Spark, Python, and H2O components involved: https://docs.h2o.ai/sparkling-water/3.2/latest-stable/doc/pysparkling.html.
We will be using Spark 3.2 here. To use a different version of Spark, go to the Sparkling Water section of the H2O downloads page at the following link: https://h2o.ai/resources/download/.
For your Sparkling Water Python client, you must use Python 2.7.x, 3.5.x, 3.6.x, or 3.7.x. We will be running Sparkling Water from a Jupyter notebook here.
Follow these steps to install Spark locally:
export SPARK_HOME="/path/to/spark/folder"
export MASTER="local[*]"
Now, let's install the Sparkling Water library in our Python environment.
Install the following modules:
pip install requests
pip install tabulate
pip install future
pip install h2o_pysparkling_3.2
Next, let's install an interactive shell.
To run Sparkling Water locally, we need to install an interactive shell to launch the Sparkling Water cluster on Spark. (This is only required when running Sparkling Water locally; Enterprise Steam takes care of this when running on your enterprise cluster.) To do so, perform the following steps:
Now, let's launch a Sparkling Water cluster and access it from a Jupyter notebook.
We assume you have Jupyter Notebook installed in the same Python environment as your installations in step 2. Perform the following steps to launch a Jupyter notebook:
PYSPARK_DRIVER_PYTHON="ipython"
PYSPARK_DRIVER_PYTHON_OPTS="notebook"
bin/pysparkling
SET PYSPARK_DRIVER_PYTHON=ipython
SET PYSPARK_DRIVER_PYTHON_OPTS=notebook
bin/pysparkling
Your Jupyter notebook should launch in your browser.
Now, let's write Sparkling Water code.
In your Jupyter notebook, type the following code to get you started:
from pysparkling import *
import h2o
hc = H2OContext.getOrCreate()
hc
localdata = "/path/to/my/csv"
mysparkdata = spark.read.load(localdata, format="csv")
myH2Odata = hc.asH2OFrame(mysparkdata)
You are now ready to build models using both H2O and Spark code.
Here, you must interact with Enterprise Steam to run H2O-3. In this case, you will install the h2osteam module in your Python client environment in addition to the h2o module as we did when running H2O-3 locally.
Get your trial access to H2O AI Cloud here: https://h2o.ai/freetrial.
When you have completed all steps and can log in to H2O AI Cloud, then we can start running H2O-3 clusters as part of the H2O AI Cloud platform. Here are the next steps.
To set up your Python client environment, perform the following steps:
pip install /path/to/download.whl
Here, /path/to/download.whl is replaced by your actual path.
pip install requests
pip install tabulate
pip install future
pip install h2o
Now, let's use Steam to start an H2O cluster and then write H2O code in Python.
Follow these steps to launch your H2O cluster, which is done on a Kubernetes server cluster:
We can now start writing code (for example in Jupyter) to build models on our H2O-3 cluster that we just launched. Perform the following steps after opening your Python client:
import h2o
import h2osteam
from h2osteam.clients import H2oKubernetesClient
conn = h2osteam.login(
url="https://SteamURL,
verify_ssl=False,
username="yourH2OAICloudUserName",
password=" yourH2OAICloudPassword")
Important Note
At the time of this writing the URL for the 90-day H2O AI Cloud trial is https://steam.cloud.h2o.ai.
For password you can use your login password to the H2O AI Cloud trial environment, or you can use a temporary personal access token generated from the Enterprise Steam Configurations page.
cluster = H2oKubernetesClient().get_cluster(
name="yourClusterName",
created_by="yourH2OAICloudUserName")
cluster.connect()
# you are now ready to write code to run on this H2O cluster
You can now write your H2O-3 code, including all samples from this book.