H2O

Before we deep dive into the examples, let's spend some time justifying our decision of using H2O as our deep learning framework for anomaly detection.

H2O is not just a library or package to install. It is an open source, rich analytics platform that provides both machine learning algorithms and high-performance parallel computing abstractions.

H2O core technology is built around a Java Virtual Machine optimized for in-memory processing of distributed data collections.

The platform is usable via a web-based UI or programmatically in many languages, such as Python, R, Java, Scala, and JSON in a REST API.

Data can be loaded from many common data sources, such as HDFS, S3, most of the popular RDBMSes, and a few other NoSQL databases.

After loading, data is represented in an H2OFrame, making it familiar to people used to working with R, Spark, and Python pandas data frames.

The backend can then be switched among different engines. It can run locally in your machine or it can be deployed in a cluster on top of Spark or Hadoop MapReduce.

H2O will automatically handle the memory occupation and will optimize the execution plan for most of the data operations and model learning.

It provides a very fast scoring of data points against a trained model; it is advertised to run in nanoseconds.

In addition to traditional data analysis and machine learning algorithms, it features a few very robust implementations of deep learning models.

The general API for building models is via the H2OEstimator. A dedicated H2ODeepLearningEstimator class can be used to build feed-forward multilayer artificial networks.

One of the main reasons why we choose H2O for anomaly detection is that it provides a built-in class very useful for our cause, the H2OAutoEncoderEstimator.

As you will see in the following examples, building an auto-encoder network only requires a few parameters to be specified and then it will self-tune the rest.

The output of an estimator is a model, which depending on the problem to be solved, can be a classification model, regression, clustering, or in our case an auto-encoder.

Deep learning with H2O is not exhaustive, but it is quite simple and straightforward. It features automatic adaptive weight initialization, adaptive learning rates, various regularization techniques, performance tuning, grid-search, and cross-fold validation just to name a few. We will explore those advanced features in Chapter 10, Building a Production-Ready Intrusion Detection System.

We also hope to see RNNs and more advanced deep learning architecture soon implemented in the framework.

The key points of H2O are scalability, reliability, and ease of use. It is a good fit for enterprise environments that care about production aspects. The simplicity and built-in functionalities make it also well suited for research tasks and curious users who want to learn and experiment with deep learning.

Getting started with H2O

H2O in local mode can be simply installed as dependency using pip. Follow the instructions at http://www.h2o.ai/download/h2o/python.

A local instance will be automatically spun at your first initialization.

Open a Jupyter notebook and create an h2o instance:

import h2o
h2o.init()

To check whether the initialization was successful, it should print something like "Checking whether there is an H2O instance running at http://localhost:54321. connected.".

You are now ready to import data and start building deep learning networks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset