We use the machine learning/data science packages such as numpy, sklearn, pandas, and matplotlib for visualization:
from time import time
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import cross_val_score
To implement the isolation forest, we use the sklearn.ensemble package:
from sklearn.ensemble import IsolationForest
To measure the performance, we use the ROC and AUC, and we will discuss these in details in a later part of this chapter.
The following code imports the relevant packages and loads the KDD data:
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import fetch_kddcup99
%matplotlib inline
dataset = fetch_kddcup99(subset=None, shuffle=True, percent10=True)
# http://www.kdd.org/kdd-cup/view/kdd-cup-1999/Tasks
X = dataset.data
y = dataset.target