We will use the intrusion detection problem again to detect anomalies. Initially, we will import pandas, as shown:

import pandas as pd

We get the names of the features from the dataset at this link: http://icsdweb.aegean.gr/awid/features.html.

We will include the features code as shown here:

features = ['frame.interface_id',
 'frame.dlt',
 'frame.offset_shift',
 'frame.time_epoch',
 'frame.time_delta',
 'frame.time_delta_displayed',
 'frame.time_relative',
 'frame.len',
 'frame.cap_len',
 'frame.marked',
 'frame.ignored',
 'radiotap.version',
 'radiotap.pad',
 'radiotap.length',
 'radiotap.present.tsft',
 'radiotap.present.flags',
 'radiotap.present.rate',
 'radiotap.present.channel',
 'radiotap.present.fhss',
 'radiotap.present.dbm_antsignal',
...

The preceding list contains all 155 features in the AWID dataset. We import the training set and see the number of rows and columns:

awid = pd.read_csv("../data/AWID-CLS-R-Trn.csv", header=None, names=features)

# see the number of rows/columns
awid.shape

We can ignore the warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2714: DtypeWarning: Columns (37,38,39,40,41,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,74,88) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

The output of the shape is a list of all the training data in the 155-feature dataset:

(1795575, 155)

We will eventually have to replace the None values:

# they use ? as a null attribute.
awid.head()

The preceding code will produce a table of 5 rows × 155 columns as an output.

We see the distribution of response vars:

awid['class'].value_counts(normalize=True)

normal 0.909564
injection 0.036411
impersonation 0.027023
flooding 0.027002
Name: class, dtype: float64

We check for NAs:

# claims there are no null values because of the ?'s'
awid.isna().sum()

The output looks like this:

frame.interface_id 0
frame.dlt 1795575
frame.offset_shift 0
frame.time_epoch 0
frame.time_delta 0
frame.time_delta_displayed 0
frame.time_relative 0
frame.len 0
frame.cap_len 0
frame.marked 0
frame.ignored 0
radiotap.version 0
radiotap.pad 0
radiotap.length 0
radiotap.present.tsft 0
radiotap.present.flags 0
radiotap.present.rate 0
radiotap.present.channel 0
radiotap.present.fhss 0
radiotap.present.dbm_antsignal 0
radiotap.present.dbm_antnoise 0
radiotap.present.lock_quality 0
radiotap.present.tx_attenuation 0
radiotap.present.db_tx_attenuation 0
radiotap.present.dbm_tx_power 0
radiotap.present.antenna 0
radiotap.present.db_antsignal 0
radiotap.present.db_antnoise 0
radiotap.present.rxflags 0
radiotap.present.xchannel 0
                                                   ... 
wlan_mgt.rsn.version 1718631
wlan_mgt.rsn.gcs.type 1718631
wlan_mgt.rsn.pcs.count 1718631
wlan_mgt.rsn.akms.count 1718633
wlan_mgt.rsn.akms.type 1718651
wlan_mgt.rsn.capabilities.preauth 1718633
wlan_mgt.rsn.capabilities.no_pairwise 1718633
wlan_mgt.rsn.capabilities.ptksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.gtksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.mfpr 1718633
wlan_mgt.rsn.capabilities.mfpc 1718633
wlan_mgt.rsn.capabilities.peerkey 1718633
wlan_mgt.tcprep.trsmt_pow 1795536
wlan_mgt.tcprep.link_mrg 1795536
wlan.wep.iv 944820
wlan.wep.key 909831
wlan.wep.icv 944820
wlan.tkip.extiv 1763655
wlan.ccmp.extiv 1792506
wlan.qos.tid 1133234
wlan.qos.priority 1133234
wlan.qos.eosp 1279874
wlan.qos.ack 1133234
wlan.qos.amsdupresent 1134226
wlan.qos.buf_state_indicated 1795575
wlan.qos.bit4 1648935
wlan.qos.txop_dur_req 1648935
wlan.qos.buf_state_indicated.1 1279874
data.len 903021
class 0
Length: 155, dtype: int64

We replace all ? marks with None:

# replace the ? marks with None
awid.replace({"?": None}, inplace=True)

The sum shows a large amount of missing data:

# Many missing pieces of data!
awid.isna().sum()

Here is what the output looks like:


frame.interface_id 0
frame.dlt 1795575
frame.offset_shift 0
frame.time_epoch 0
frame.time_delta 0
frame.time_delta_displayed 0
frame.time_relative 0
frame.len 0
frame.cap_len 0
frame.marked 0
frame.ignored 0
radiotap.version 0
radiotap.pad 0
radiotap.length 0
radiotap.present.tsft 0
radiotap.present.flags 0
radiotap.present.rate 0
radiotap.present.channel 0
radiotap.present.fhss 0
radiotap.present.dbm_antsignal 0
radiotap.present.dbm_antnoise 0
radiotap.present.lock_quality 0
radiotap.present.tx_attenuation 0
radiotap.present.db_tx_attenuation 0
radiotap.present.dbm_tx_power 0
radiotap.present.antenna 0
radiotap.present.db_antsignal 0
radiotap.present.db_antnoise 0
radiotap.present.rxflags 0
radiotap.present.xchannel 0
                                                   ... 
wlan_mgt.rsn.version 1718631
wlan_mgt.rsn.gcs.type 1718631
wlan_mgt.rsn.pcs.count 1718631
wlan_mgt.rsn.akms.count 1718633
wlan_mgt.rsn.akms.type 1718651
wlan_mgt.rsn.capabilities.preauth 1718633
wlan_mgt.rsn.capabilities.no_pairwise 1718633
wlan_mgt.rsn.capabilities.ptksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.gtksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.mfpr 1718633
wlan_mgt.rsn.capabilities.mfpc 1718633
wlan_mgt.rsn.capabilities.peerkey 1718633
wlan_mgt.tcprep.trsmt_pow 1795536
wlan_mgt.tcprep.link_mrg 1795536
wlan.wep.iv 944820
wlan.wep.key 909831
wlan.wep.icv 944820
wlan.tkip.extiv 1763655
wlan.ccmp.extiv 1792506
wlan.qos.tid 1133234
wlan.qos.priority 1133234
wlan.qos.eosp 1279874
wlan.qos.ack 1133234
wlan.qos.amsdupresent 1134226
wlan.qos.buf_state_indicated 1795575
wlan.qos.bit4 1648935
wlan.qos.txop_dur_req 1648935
wlan.qos.buf_state_indicated.1 1279874
data.len 903021

Here, we remove columns that have over 50% of their data missing:

columns_with_mostly_null_data = awid.columns[awid.isnull().mean() >= 0.5]

# 72 columns are going to be affected!
columns_with_mostly_null_data.shape

Out[11]:
(72,)

We drop the columns with over 50% of their data missing:

awid.drop(columns_with_mostly_null_data, axis=1, inplace=True)

The output can be seen as follows:

awid.shape

(1795575, 83)

Now, drop the rows that have missing values:

# 
awid.dropna(inplace=True)  # drop rows with null data

We lost 456,169 rows:

awid.shape

(1339406, 83)

However, it doesn't affect our distribution too much:

# 0.878763 is our null accuracy. Our model must be better than this number to be a contender

awid['class'].value_counts(normalize=True)

normal 0.878763
injection 0.048812
impersonation 0.036227
flooding 0.036198
Name: class, dtype: float64

We only select numerical columns for our ML algorithms, but there should be more:

awid.select_dtypes(['number']).shape

(1339406, 45)

We transform all columns into numerical dtypes:

for col in awid.columns:
    awid[col] = pd.to_numeric(awid[col], errors='ignore')

# that makes more sense
awid.select_dtypes(['number']).shape

The output can be seen here:

Out[19]:

(1339406, 74)

We derive basic descriptive statistics:

awid.describe()

By executing the preceding code will get a table of 8 rows × 74 columns.

X, y = awid.select_dtypes(['number']), awid['class']

We do a basic Naive Bayes fitting. We fit our model to the data:

from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()

nb.fit(X, y)

Gaussian Naive Bayes is performed as follows:

GaussianNB(priors=None, var_smoothing=1e-09)

We read in the test data and do the same transformations to it, to match the training data:

awid_test = pd.read_csv("../data/AWID-CLS-R-Tst.csv", header=None, names=features)

# drop the problematic columns
awid_test.drop(columns_with_mostly_null_data, axis=1, inplace=True)

# replace ? with None
awid_test.replace({"?": None}, inplace=True)

# drop the rows with null data
awid_test.dropna(inplace=True) # drop rows with null data

# convert columns to numerical values
for col in awid_test.columns:
    awid_test[col] = pd.to_numeric(awid_test[col], errors='ignore')
awid_test.shape

The output is as follows:

Out[23]:

(389185, 83)

We compute the basic metric, accuracy:

from sklearn.metrics import accuracy_score

We define a simple function to test the accuracy of a model fitted on training data by using our testing data:

X_test = awid_test.select_dtypes(['number'])
y_test = awid_test['class']

def get_test_accuracy_of(model):
    y_preds = model.predict(X_test)
    return accuracy_score(y_preds, y_test)
    
# naive bayes does very poorly on its own!
get_test_accuracy_of(nb)

The output can be seen here:

Out[25]:

0.26535452291326744

We perform logistic regression, but it performs even worse:

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()

lr.fit(X, y)

# Logistic Regressions does even worse
get_test_accuracy_of(lr)

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
  FutureWarning)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/linear_model/logistic.py:459: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)

The following shows the output:

Out[26]:

0.015773989233911892

We test with DecisionTreeClassifier as shown here:

from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier()

tree.fit(X, y)

# Tree does very well!
get_test_accuracy_of(tree)

The output can be seen as follows:

Out[27]:

0.9280830453383352

We test the Gini scores of the decision tree features as follows:

pd.DataFrame({'feature':awid.select_dtypes(['number']).columns, 
              'importance':tree.feature_importances_}).sort_values('importance', ascending=False).head(10)

The output of the preceding code gives the following table:

feature	importance
7	`frame.cap_len`	0.222489
4	`frame.time_delta_displayed`	0.221133
68	`wlan.fc.protected`	0.146001
70	`wlan.duration`	0.127674
5	`frame.time_relative`	0.077353
6	`frame.len`	0.067667
62	`wlan.fc.type`	0.039926
72	`wlan.seq`	0.027947
65	`wlan.fc.retry`	0.019839
58	`radiotap.dbm_antsignal`	0.014197

We import RandomForestClassifier as shown here:

from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier()

forest.fit(X, y)

# Random Forest does slightly worse
get_test_accuracy_of(forest)

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/ensemble/forest.py:248: FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
  "10 in version 0.20 to 100 in 0.22.", FutureWarning)

The following is the output:

Out[29]:

0.9357349332579622

We create a pipeline that will scale the numerical data and then feed the resulting data into a decision tree:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", DecisionTreeClassifier())
])

# try varying levels of depth
params = {
    "classifier__max_depth": [None, 3, 5, 10], 
         }

# instantiate a gridsearch module
grid = GridSearchCV(pipeline, params)
# fit the module
grid.fit(X, y)

# test the best model
get_test_accuracy_of(grid.best_estimator_)

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/model_selection/_split.py:1943: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
  warnings.warn(CV_WARNING, FutureWarning)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/preprocessing/data.py:617: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/base.py:465: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.fit(X, y, **fit_params).transform(X)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/pipeline.py:451: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)

The output is as follows:

Out[30]:

0.926258720145946

We try the same thing with a random forest:

 preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", RandomForestClassifier())
])

# try varying levels of depth
params = {
    "classifier__max_depth": [None, 3, 5, 10], 
         }

grid = GridSearchCV(pipeline, params)
grid.fit(X, y)
# best accuracy so far!
get_test_accuracy_of(grid.best_estimator_)

The following shows the output:

Out[31]:

0.8893431144571348

We import LabelEncoder:

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoded_y = encoder.fit_transform(y)
encoded_y.shape

The output is as follows:

Out[119]:

(1339406,)

encoded_y

Out[121]:

array([3, 3, 3, ..., 3, 3, 3])

We do this to import LabelBinarizer:

from sklearn.preprocessing import LabelBinarizer
binarizer = LabelBinarizer()
binarized_y = binarizer.fit_transform(encoded_y)
binarized_y.shape

We will get the following output:

(1339406, 4)

Now, execute the following code:

binarized_y[:5,]

And the output will be as follows:

array([[0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1]])

Run the y.head() command:

y.head()

The output is as follows:

0    normal
1    normal
2    normal
3    normal
4    normal
Name: class, dtype: object

Now run the following code:

print encoder.classes_
print binarizer.classes_

The output can be seen as follows:

['flooding' 'impersonation' 'injection' 'normal']
[0 1 2 3]

Import the following packages:

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

We baseline the model for the neural network. We choose a hidden layer of 10 neurons. A lower number of neurons helps to eliminate the redundancies in the data and select the most important features:

def create_baseline_model(n, input_dim):
    # create model
    model = Sequential()
    model.add(Dense(n, input_dim=input_dim, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='sigmoid'))
    # Compile model. We use the the logarithmic loss function, and the Adam gradient optimizer.
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

KerasClassifier(build_fn=create_baseline_model, epochs=100, batch_size=5, verbose=0, n=20)

We can see the following output:

<keras.wrappers.scikit_learn.KerasClassifier at 0x149c1c210>

Run the following code:

# use the KerasClassifier

preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=create_baseline_model, epochs=2, batch_size=128, 
                                   verbose=1, n=10, input_dim=74))
])

cross_val_score(pipeline, X, binarized_y)

The Epoch length can be seen as follows:

Epoch 1/2
892937/892937 [==============================] - 21s 24us/step - loss: 0.1027 - acc: 0.9683
Epoch 2/2
892937/892937 [==============================] - 18s 20us/step - loss: 0.0314 - acc: 0.9910
446469/446469 [==============================] - 4s 10us/step
Epoch 1/2
892937/892937 [==============================] - 24s 27us/step - loss: 0.1089 - acc: 0.9682
Epoch 2/2
892937/892937 [==============================] - 19s 22us/step - loss: 0.0305 - acc: 0.9919 0s - loss: 0.0
446469/446469 [==============================] - 4s 9us/step
Epoch 1/2
892938/892938 [==============================] - 18s 20us/step - loss: 0.0619 - acc: 0.9815
Epoch 2/2
892938/892938 [==============================] - 17s 20us/step - loss: 0.0153 - acc: 0.9916
446468/446468 [==============================] - 4s 9us/step

The output for the preceding code is as follows:

array([0.97450887, 0.99176875, 0.74421683])

# notice the LARGE variance in scores of a neural network. This is due to the high-variance nature of how networks fit
# using stochastic gradient descent

pipeline.fit(X, binarized_y)

Epoch 1/2
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0781 - acc: 0.9740
Epoch 2/2
1339406/1339406 [==============================] - 25s 19us/step - loss: 0.0298 - acc: 0.9856

We will get the following code as an output:

Pipeline(memory=None,
     steps=[('preprocessing', Pipeline(memory=None,
     steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True))])), ('classifier', <keras.wrappers.scikit_learn.KerasClassifier object at 0x149c1c350>)])

Now execute the following code:

# remake 
encoded_y_test = encoder.transform(y_test)
def get_network_test_accuracy_of(model):
    y_preds = model.predict(X_test)
    return accuracy_score(y_preds, encoded_y_test)

# not the best accuracy

get_network_test_accuracy_of(pipeline)

389185/389185 [==============================] - 3s 7us/step

The following is the output of the preceding input:

0.889327697624523

By fitting again, we get a different test accuracy. This also highlights the variance on the network:

# 
pipeline.fit(X, binarized_y)
get_network_test_accuracy_of(pipeline)

Epoch 1/2
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0844 - acc: 0.9735 0s - loss: 0.085
Epoch 2/2
1339406/1339406 [==============================] - 32s 24us/step - loss: 0.0323 - acc: 0.9853 0s - loss: 0.0323 - acc: 0
389185/389185 [==============================] - 4s 11us/step

We will get the following code:

0.8742526048023433

We add some more epochs to learn more:

preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=create_baseline_model, epochs=10, batch_size=128, 
                                   verbose=1, n=10, input_dim=74))
])

cross_val_score(pipeline, X, binarized_y)

We get output as follows:

Epoch 1/10
892937/892937 [==============================] - 20s 22us/step - loss: 0.0945 - acc: 0.9744
Epoch 2/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0349 - acc: 0.9906
Epoch 3/10
892937/892937 [==============================] - 16s 18us/step - loss: 0.0293 - acc: 0.9920
Epoch 4/10
892937/892937 [==============================] - 17s 20us/step - loss: 0.0261 - acc: 0.9932
Epoch 5/10
892937/892937 [==============================] - 18s 20us/step - loss: 0.0231 - acc: 0.9938 0s - loss: 0.0232 - ac
Epoch 6/10
892937/892937 [==============================] - 15s 17us/step - loss: 0.0216 - acc: 0.9941
Epoch 7/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0206 - acc: 0.9944
Epoch 8/10
892937/892937 [==============================] - 17s 20us/step - loss: 0.0199 - acc: 0.9947 0s - loss: 0.0198 - a
Epoch 9/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0194 - acc: 0.9948
Epoch 10/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0189 - acc: 0.9950
446469/446469 [==============================] - 4s 10us/step
Epoch 1/10
892937/892937 [==============================] - 19s 21us/step - loss: 0.1160 - acc: 0.9618
...

Out[174]:

array([0.97399595, 0.9939951 , 0.74381591])

By fitting again, we get a different test accuracy. This also highlights the variance on the network:

pipeline.fit(X, binarized_y)
get_network_test_accuracy_of(pipeline)

Epoch 1/10
1339406/1339406 [==============================] - 30s 22us/step - loss: 0.0812 - acc: 0.9754
Epoch 2/10
1339406/1339406 [==============================] - 27s 20us/step - loss: 0.0280 - acc: 0.9915
Epoch 3/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0226 - acc: 0.9921
Epoch 4/10
1339406/1339406 [==============================] - 27s 20us/step - loss: 0.0193 - acc: 0.9940
Epoch 5/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0169 - acc: 0.9951
Epoch 6/10
1339406/1339406 [==============================] - 34s 25us/step - loss: 0.0155 - acc: 0.9955
Epoch 7/10
1339406/1339406 [==============================] - 38s 28us/step - loss: 0.0148 - acc: 0.9957
Epoch 8/10
1339406/1339406 [==============================] - 34s 25us/step - loss: 0.0143 - acc: 0.9958 3s -
Epoch 9/10
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0139 - acc: 0.9960
Epoch 10/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0134 - acc: 0.9961
389185/389185 [==============================] - 3s 8us/step

The output of the preceding code is as follows:

0.8725027943009109

This took much longer and still didn't increase the accuracy. We change our function to have multiple hidden layers in our network:


def network_builder(hidden_dimensions, input_dim):
    # create model
    model = Sequential()
    model.add(Dense(hidden_dimensions[0], input_dim=input_dim, kernel_initializer='normal', activation='relu'))

    # add multiple hidden layers
    for dimension in hidden_dimensions[1:]:
        model.add(Dense(dimension, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='sigmoid'))

    # Compile model. We use the the logarithmic loss function, and the Adam gradient optimizer.
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

We add some more hidden layers to learn more:

# 
preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=network_builder, epochs=10, batch_size=128, 
                                   verbose=1, hidden_dimensions=(60,30,10), input_dim=74))
])

cross_val_score(pipeline, X, binarized_y)

We get the output as follows:

Epoch 1/10
892937/892937 [==============================] - 24s 26us/step - loss: 0.0457 - acc: 0.9860
Epoch 2/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0113 - acc: 0.9967
Epoch 3/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0079 - acc: 0.9977
Epoch 4/10
892937/892937 [==============================] - 26s 29us/step - loss: 0.0066 - acc: 0.9982
Epoch 5/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0061 - acc: 0.9983
Epoch 6/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0057 - acc: 0.9984 
Epoch 7/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0051 - acc: 0.9985
Epoch 8/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0050 - acc: 0.9986
Epoch 9/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0046 - acc: 0.9986
Epoch 10/10
892937/892937 [==============================] - 23s 26us/step - loss: 0.0044 - acc: 0.9987
446469/446469 [==============================] - 6s 12us/step
Epoch 1/10
892937/892937 [==============================] - 27s 30us/step - loss: 0.0538 - acc: 0.9826

For binarized_y, we get this:

pipeline.fit(X, binarized_y)
get_network_test_accuracy_of(pipeline)

We get the epoch output as follows:

Epoch 1/10
1339406/1339406 [==============================] - 31s 23us/step - loss: 0.0422 - acc: 0.9865
Epoch 2/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0095 - acc: 0.9973
Epoch 3/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0068 - acc: 0.9981
Epoch 4/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0056 - acc: 0.9984
Epoch 5/10
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0051 - acc: 0.9986
Epoch 6/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0047 - acc: 0.9987
Epoch 7/10
1339406/1339406 [==============================] - 30s 22us/step - loss: 0.0041 - acc: 0.9988 0s - loss: 0.0041 - acc: 0.99 - ETA: 0s - loss: 0.0041 - acc: 0.998 - ETA: 0s - loss: 0.0041 - acc: 0
Epoch 8/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0039 - acc: 0.9989
Epoch 9/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0039 - acc: 0.9989
Epoch 10/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0036 - acc: 0.9990 0s - loss: 0.0036 - acc: 
389185/389185 [==============================] - 3s 9us/step
...

Out[179]

0.8897876331307732

We got a small bump by increasing the hidden layers. Adding some more hidden layers to learn more, we get the following:


preprocessing = Pipeline([
    ("scale", StandardScaler()),
])

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=network_builder, epochs=10, batch_size=128, 
                                   verbose=1, hidden_dimensions=(30,30,30,10), input_dim=74))
])

cross_val_score(pipeline, X, binarized_y)

The Epoch output is as shown here:

Epoch 1/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0671 - acc: 0.9709
Epoch 2/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0139 - acc: 0.9963
Epoch 3/10
892937/892937 [==============================] - 20s 22us/step - loss: 0.0100 - acc: 0.9973
Epoch 4/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0087 - acc: 0.9977
Epoch 5/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0078 - acc: 0.9979
Epoch 6/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0072 - acc: 0.9981
Epoch 7/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0069 - acc: 0.9982
Epoch 8/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0064 - acc: 0.9984
...

The output can be seen as follows:

array([0.97447527, 0.99417877, 0.74292446])

Execute the following command pipeline.fit():

pipeline.fit(X, binarized_y)
get_network_test_accuracy_of(pipeline)

Epoch 1/10
1339406/1339406 [==============================] - 48s 36us/step - loss: 0.0666 - acc: 0.9548
Epoch 2/10
1339406/1339406 [==============================] - 108s 81us/step - loss: 0.0346 - acc: 0.9663
Epoch 3/10
1339406/1339406 [==============================] - 78s 59us/step - loss: 0.0261 - acc: 0.9732
Epoch 4/10
1339406/1339406 [==============================] - 102s 76us/step - loss: 0.0075 - acc: 0.9980
Epoch 5/10
1339406/1339406 [==============================] - 71s 53us/step - loss: 0.0066 - acc: 0.9983
Epoch 6/10
1339406/1339406 [==============================] - 111s 83us/step - loss: 0.0059 - acc: 0.9985
Epoch 7/10
1339406/1339406 [==============================] - 98s 73us/step - loss: 0.0055 - acc: 0.9986
Epoch 8/10
1339406/1339406 [==============================] - 93s 70us/step - loss: 0.0052 - acc: 0.9987
Epoch 9/10
1339406/1339406 [==============================] - 88s 66us/step - loss: 0.0051 - acc: 0.9988
Epoch 10/10
1339406/1339406 [==============================] - 87s 65us/step - loss: 0.0049 - acc: 0.9988
389185/389185 [==============================] - 16s 41us/step

By executing the preceding code we will get the following ouput:

0.8899315235684828

The best result so far comes from using deep learning. However, deep learning isn't the best choice for all datasets.

Table of Contents for
Using TensorFlow for intrusion detection

Using TensorFlow for intrusion detection

Table of Contents for Using TensorFlow for intrusion detection

Create new playlist

Sign In

Sign Up

Table of Contents for
Using TensorFlow for intrusion detection