Using TensorFlow for intrusion detection

We will use the intrusion detection problem again to detect anomalies. Initially, we will import pandas, as shown:

import pandas as pd

We get the names of the features from the dataset at this link:

We will include the features code as shown here:

features = ['frame.interface_id',

The preceding list contains all 155 features in the AWID dataset. We import the training set and see the number of rows and columns:

awid = pd.read_csv("../data/AWID-CLS-R-Trn.csv", header=None, names=features)

# see the number of rows/columns

We can ignore the warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/IPython/core/ DtypeWarning: Columns (37,38,39,40,41,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,74,88) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

The output of the shape is a list of all the training data in the 155-feature dataset:

(1795575, 155)

We will eventually have to replace the None values:

# they use ? as a null attribute.

The preceding code will produce a table of 5 rows × 155 columns as an output.

We see the distribution of response vars:


normal 0.909564
injection 0.036411
impersonation 0.027023
flooding 0.027002
Name: class, dtype: float64

We check for NAs:

# claims there are no null values because of the ?'s'

The output looks like this:

frame.interface_id 0
frame.dlt 1795575
frame.offset_shift 0
frame.time_epoch 0
frame.time_delta 0
frame.time_delta_displayed 0
frame.time_relative 0
frame.len 0
frame.cap_len 0
frame.marked 0
frame.ignored 0
radiotap.version 0
radiotap.pad 0
radiotap.length 0
radiotap.present.tsft 0
radiotap.present.flags 0
radiotap.present.rate 0 0
radiotap.present.fhss 0
radiotap.present.dbm_antsignal 0
radiotap.present.dbm_antnoise 0
radiotap.present.lock_quality 0
radiotap.present.tx_attenuation 0
radiotap.present.db_tx_attenuation 0
radiotap.present.dbm_tx_power 0
radiotap.present.antenna 0
radiotap.present.db_antsignal 0
radiotap.present.db_antnoise 0
radiotap.present.rxflags 0
radiotap.present.xchannel 0
wlan_mgt.rsn.version 1718631
wlan_mgt.rsn.gcs.type 1718631
wlan_mgt.rsn.pcs.count 1718631
wlan_mgt.rsn.akms.count 1718633
wlan_mgt.rsn.akms.type 1718651
wlan_mgt.rsn.capabilities.preauth 1718633
wlan_mgt.rsn.capabilities.no_pairwise 1718633
wlan_mgt.rsn.capabilities.ptksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.gtksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.mfpr 1718633
wlan_mgt.rsn.capabilities.mfpc 1718633
wlan_mgt.rsn.capabilities.peerkey 1718633
wlan_mgt.tcprep.trsmt_pow 1795536
wlan_mgt.tcprep.link_mrg 1795536
wlan.wep.iv 944820
wlan.wep.key 909831
wlan.wep.icv 944820
wlan.tkip.extiv 1763655
wlan.ccmp.extiv 1792506
wlan.qos.tid 1133234
wlan.qos.priority 1133234
wlan.qos.eosp 1279874
wlan.qos.ack 1133234
wlan.qos.amsdupresent 1134226
wlan.qos.buf_state_indicated 1795575
wlan.qos.bit4 1648935
wlan.qos.txop_dur_req 1648935
wlan.qos.buf_state_indicated.1 1279874
data.len 903021
class 0
Length: 155, dtype: int64

We replace all ? marks with None:

# replace the ? marks with None
awid.replace({"?": None}, inplace=True)

The sum shows a large amount of missing data:

# Many missing pieces of data!

Here is what the output looks like:

frame.interface_id 0
frame.dlt 1795575
frame.offset_shift 0
frame.time_epoch 0
frame.time_delta 0
frame.time_delta_displayed 0
frame.time_relative 0
frame.len 0
frame.cap_len 0
frame.marked 0
frame.ignored 0
radiotap.version 0
radiotap.pad 0
radiotap.length 0
radiotap.present.tsft 0
radiotap.present.flags 0
radiotap.present.rate 0 0
radiotap.present.fhss 0
radiotap.present.dbm_antsignal 0
radiotap.present.dbm_antnoise 0
radiotap.present.lock_quality 0
radiotap.present.tx_attenuation 0
radiotap.present.db_tx_attenuation 0
radiotap.present.dbm_tx_power 0
radiotap.present.antenna 0
radiotap.present.db_antsignal 0
radiotap.present.db_antnoise 0
radiotap.present.rxflags 0
radiotap.present.xchannel 0
wlan_mgt.rsn.version 1718631
wlan_mgt.rsn.gcs.type 1718631
wlan_mgt.rsn.pcs.count 1718631
wlan_mgt.rsn.akms.count 1718633
wlan_mgt.rsn.akms.type 1718651
wlan_mgt.rsn.capabilities.preauth 1718633
wlan_mgt.rsn.capabilities.no_pairwise 1718633
wlan_mgt.rsn.capabilities.ptksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.gtksa_replay_counter 1718633
wlan_mgt.rsn.capabilities.mfpr 1718633
wlan_mgt.rsn.capabilities.mfpc 1718633
wlan_mgt.rsn.capabilities.peerkey 1718633
wlan_mgt.tcprep.trsmt_pow 1795536
wlan_mgt.tcprep.link_mrg 1795536
wlan.wep.iv 944820
wlan.wep.key 909831
wlan.wep.icv 944820
wlan.tkip.extiv 1763655
wlan.ccmp.extiv 1792506
wlan.qos.tid 1133234
wlan.qos.priority 1133234
wlan.qos.eosp 1279874
wlan.qos.ack 1133234
wlan.qos.amsdupresent 1134226
wlan.qos.buf_state_indicated 1795575
wlan.qos.bit4 1648935
wlan.qos.txop_dur_req 1648935
wlan.qos.buf_state_indicated.1 1279874
data.len 903021

Here, we remove columns that have over 50% of their data missing:

columns_with_mostly_null_data = awid.columns[awid.isnull().mean() >= 0.5]

# 72 columns are going to be affected!


We drop the columns with over 50% of their data missing:

awid.drop(columns_with_mostly_null_data, axis=1, inplace=True)

The output can be seen as follows:


(1795575, 83)

Now, drop the rows that have missing values:

awid.dropna(inplace=True)  # drop rows with null data


We lost 456,169 rows:


(1339406, 83)

However, it doesn't affect our distribution too much:

# 0.878763 is our null accuracy. Our model must be better than this number to be a contender


normal 0.878763
injection 0.048812
impersonation 0.036227
flooding 0.036198
Name: class, dtype: float64

We only select numerical columns for our ML algorithms, but there should be more:


(1339406, 45)

We transform all columns into numerical dtypes:

for col in awid.columns:
    awid[col] = pd.to_numeric(awid[col], errors='ignore')
# that makes more sense

The output can be seen here:


(1339406, 74)

We derive basic descriptive statistics:


By executing the preceding code will get a table of 8 rows × 74 columns.

X, y = awid.select_dtypes(['number']), awid['class']

We do a basic Naive Bayes fitting. We fit our model to the data:

from sklearn.naive_bayes import GaussianNB

nb = GaussianNB(), y)

Gaussian Naive Bayes is performed as follows:

GaussianNB(priors=None, var_smoothing=1e-09)

We read in the test data and do the same transformations to it, to match the training data:

awid_test = pd.read_csv("../data/AWID-CLS-R-Tst.csv", header=None, names=features)

# drop the problematic columns
awid_test.drop(columns_with_mostly_null_data, axis=1, inplace=True)

# replace ? with None
awid_test.replace({"?": None}, inplace=True)

# drop the rows with null data
awid_test.dropna(inplace=True) # drop rows with null data

# convert columns to numerical values
for col in awid_test.columns:
awid_test[col] = pd.to_numeric(awid_test[col], errors='ignore')

The output is as follows:


(389185, 83)

We compute the basic metric, accuracy:

from sklearn.metrics import accuracy_score

We define a simple function to test the accuracy of a model fitted on training data by using our testing data:

X_test = awid_test.select_dtypes(['number'])
y_test = awid_test['class']

def get_test_accuracy_of(model):
y_preds = model.predict(X_test)
return accuracy_score(y_preds, y_test)

# naive bayes does very poorly on its own!

The output can be seen here:



We perform logistic regression, but it performs even worse:

from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(), y)

# Logistic Regressions does even worse

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/linear_model/ FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/linear_model/ FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
"this warning.", FutureWarning)

The following shows the output:



We test with DecisionTreeClassifier as shown here:

from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier(), y)

# Tree does very well!

The output can be seen as follows:



We test the Gini scores of the decision tree features as follows:

'importance':tree.feature_importances_}).sort_values('importance', ascending=False).head(10)

The output of the preceding code gives the following table:


































We import RandomForestClassifier as shown here:

from sklearn.ensemble import RandomForestClassifier

forest = RandomForestClassifier(), y)

# Random Forest does slightly worse

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/ensemble/ FutureWarning: The default value of n_estimators will change from 10 in version 0.20 to 100 in 0.22.
"10 in version 0.20 to 100 in 0.22.", FutureWarning)

The following is the output:



We create a pipeline that will scale the numerical data and then feed the resulting data into a decision tree:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

preprocessing = Pipeline([
("scale", StandardScaler()),

pipeline = Pipeline([
("preprocessing", preprocessing),
("classifier", DecisionTreeClassifier())

# try varying levels of depth
params = {
"classifier__max_depth": [None, 3, 5, 10],

# instantiate a gridsearch module
grid = GridSearchCV(pipeline, params)
# fit the module, y)

# test the best model

We can ignore this warning:

/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/model_selection/ FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
  warnings.warn(CV_WARNING, FutureWarning)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/preprocessing/ DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/ DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  return, y, **fit_params).transform(X)
/Users/sinanozdemir/Desktop/cyber/env/lib/python2.7/site-packages/sklearn/ DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)

The output is as follows:



We try the same thing with a random forest:

 preprocessing = Pipeline([
("scale", StandardScaler()),

pipeline = Pipeline([
("preprocessing", preprocessing),
("classifier", RandomForestClassifier())

# try varying levels of depth
params = {
"classifier__max_depth": [None, 3, 5, 10],

grid = GridSearchCV(pipeline, params), y)
# best accuracy so far!

The following shows the output:



We import LabelEncoder:

from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
encoded_y = encoder.fit_transform(y)

The output is as follows:




array([3, 3, 3, ..., 3, 3, 3])

We do this to import LabelBinarizer:

from sklearn.preprocessing import LabelBinarizer
binarizer = LabelBinarizer()
binarized_y = binarizer.fit_transform(encoded_y)

We will get the following output:

(1339406, 4)

Now, execute the following code:


And the output will be as follows:

array([[0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1],
       [0, 0, 0, 1]])

Run the y.head() command:


The output is as follows:

0    normal
1    normal
2    normal
3    normal
4    normal
Name: class, dtype: object

Now run the following code:

print encoder.classes_
print binarizer.classes_

The output can be seen as follows:

['flooding' 'impersonation' 'injection' 'normal']
[0 1 2 3]

Import the following packages:

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

We baseline the model for the neural network. We choose a hidden layer of 10 neurons. A lower number of neurons helps to eliminate the redundancies in the data and select the most important features:

def create_baseline_model(n, input_dim):
    # create model
    model = Sequential()
    model.add(Dense(n, input_dim=input_dim, kernel_initializer='normal', activation='relu'))
    model.add(Dense(4, kernel_initializer='normal', activation='sigmoid'))
    # Compile model. We use the the logarithmic loss function, and the Adam gradient optimizer.
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

KerasClassifier(build_fn=create_baseline_model, epochs=100, batch_size=5, verbose=0, n=20)

We can see the following output:

<keras.wrappers.scikit_learn.KerasClassifier at 0x149c1c210>

Run the following code:

# use the KerasClassifier

preprocessing = Pipeline([
    ("scale", StandardScaler()),

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=create_baseline_model, epochs=2, batch_size=128, 
                                   verbose=1, n=10, input_dim=74))

cross_val_score(pipeline, X, binarized_y)

The Epoch length can be seen as follows:

Epoch 1/2
892937/892937 [==============================] - 21s 24us/step - loss: 0.1027 - acc: 0.9683
Epoch 2/2
892937/892937 [==============================] - 18s 20us/step - loss: 0.0314 - acc: 0.9910
446469/446469 [==============================] - 4s 10us/step
Epoch 1/2
892937/892937 [==============================] - 24s 27us/step - loss: 0.1089 - acc: 0.9682
Epoch 2/2
892937/892937 [==============================] - 19s 22us/step - loss: 0.0305 - acc: 0.9919 0s - loss: 0.0
446469/446469 [==============================] - 4s 9us/step
Epoch 1/2
892938/892938 [==============================] - 18s 20us/step - loss: 0.0619 - acc: 0.9815
Epoch 2/2
892938/892938 [==============================] - 17s 20us/step - loss: 0.0153 - acc: 0.9916
446468/446468 [==============================] - 4s 9us/step

The output for the preceding code is as follows:

array([0.97450887, 0.99176875, 0.74421683])
# notice the LARGE variance in scores of a neural network. This is due to the high-variance nature of how networks fit
# using stochastic gradient descent, binarized_y)
Epoch 1/2
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0781 - acc: 0.9740
Epoch 2/2
1339406/1339406 [==============================] - 25s 19us/step - loss: 0.0298 - acc: 0.9856

We will get the following code as an output:

steps=[('preprocessing', Pipeline(memory=None,
steps=[('scale', StandardScaler(copy=True, with_mean=True, with_std=True))])), ('classifier', <keras.wrappers.scikit_learn.KerasClassifier object at 0x149c1c350>)])

Now execute the following code:

# remake 
encoded_y_test = encoder.transform(y_test)
def get_network_test_accuracy_of(model):
    y_preds = model.predict(X_test)
    return accuracy_score(y_preds, encoded_y_test)

# not the best accuracy


389185/389185 [==============================] - 3s 7us/step

The following is the output of the preceding input:


By fitting again, we get a different test accuracy. This also highlights the variance on the network:

#, binarized_y)
Epoch 1/2
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0844 - acc: 0.9735 0s - loss: 0.085
Epoch 2/2
1339406/1339406 [==============================] - 32s 24us/step - loss: 0.0323 - acc: 0.9853 0s - loss: 0.0323 - acc: 0
389185/389185 [==============================] - 4s 11us/step

We will get the following code:


We add some more epochs to learn more:

preprocessing = Pipeline([
("scale", StandardScaler()),

pipeline = Pipeline([
("preprocessing", preprocessing),
("classifier", KerasClassifier(build_fn=create_baseline_model, epochs=10, batch_size=128,
verbose=1, n=10, input_dim=74))

cross_val_score(pipeline, X, binarized_y)

We get output as follows:

Epoch 1/10
892937/892937 [==============================] - 20s 22us/step - loss: 0.0945 - acc: 0.9744
Epoch 2/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0349 - acc: 0.9906
Epoch 3/10
892937/892937 [==============================] - 16s 18us/step - loss: 0.0293 - acc: 0.9920
Epoch 4/10
892937/892937 [==============================] - 17s 20us/step - loss: 0.0261 - acc: 0.9932
Epoch 5/10
892937/892937 [==============================] - 18s 20us/step - loss: 0.0231 - acc: 0.9938 0s - loss: 0.0232 - ac
Epoch 6/10
892937/892937 [==============================] - 15s 17us/step - loss: 0.0216 - acc: 0.9941
Epoch 7/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0206 - acc: 0.9944
Epoch 8/10
892937/892937 [==============================] - 17s 20us/step - loss: 0.0199 - acc: 0.9947 0s - loss: 0.0198 - a
Epoch 9/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0194 - acc: 0.9948
Epoch 10/10
892937/892937 [==============================] - 17s 19us/step - loss: 0.0189 - acc: 0.9950
446469/446469 [==============================] - 4s 10us/step
Epoch 1/10
892937/892937 [==============================] - 19s 21us/step - loss: 0.1160 - acc: 0.9618

array([0.97399595, 0.9939951 , 0.74381591])

By fitting again, we get a different test accuracy. This also highlights the variance on the network:, binarized_y)
Epoch 1/10
1339406/1339406 [==============================] - 30s 22us/step - loss: 0.0812 - acc: 0.9754
Epoch 2/10
1339406/1339406 [==============================] - 27s 20us/step - loss: 0.0280 - acc: 0.9915
Epoch 3/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0226 - acc: 0.9921
Epoch 4/10
1339406/1339406 [==============================] - 27s 20us/step - loss: 0.0193 - acc: 0.9940
Epoch 5/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0169 - acc: 0.9951
Epoch 6/10
1339406/1339406 [==============================] - 34s 25us/step - loss: 0.0155 - acc: 0.9955
Epoch 7/10
1339406/1339406 [==============================] - 38s 28us/step - loss: 0.0148 - acc: 0.9957
Epoch 8/10
1339406/1339406 [==============================] - 34s 25us/step - loss: 0.0143 - acc: 0.9958 3s -
Epoch 9/10
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0139 - acc: 0.9960
Epoch 10/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0134 - acc: 0.9961
389185/389185 [==============================] - 3s 8us/step

The output of the preceding code is as follows:


This took much longer and still didn't increase the accuracy. We change our function to have multiple hidden layers in our network:

def network_builder(hidden_dimensions, input_dim):
# create model
model = Sequential()
model.add(Dense(hidden_dimensions[0], input_dim=input_dim, kernel_initializer='normal', activation='relu'))

# add multiple hidden layers
for dimension in hidden_dimensions[1:]:
model.add(Dense(dimension, kernel_initializer='normal', activation='relu'))
model.add(Dense(4, kernel_initializer='normal', activation='sigmoid'))

# Compile model. We use the the logarithmic loss function, and the Adam gradient optimizer.
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model

We add some more hidden layers to learn more:

preprocessing = Pipeline([
    ("scale", StandardScaler()),

pipeline = Pipeline([
    ("preprocessing", preprocessing),
    ("classifier", KerasClassifier(build_fn=network_builder, epochs=10, batch_size=128, 
                                   verbose=1, hidden_dimensions=(60,30,10), input_dim=74))

cross_val_score(pipeline, X, binarized_y)

We get the output as follows:

Epoch 1/10
892937/892937 [==============================] - 24s 26us/step - loss: 0.0457 - acc: 0.9860
Epoch 2/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0113 - acc: 0.9967
Epoch 3/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0079 - acc: 0.9977
Epoch 4/10
892937/892937 [==============================] - 26s 29us/step - loss: 0.0066 - acc: 0.9982
Epoch 5/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0061 - acc: 0.9983
Epoch 6/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0057 - acc: 0.9984
Epoch 7/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0051 - acc: 0.9985
Epoch 8/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0050 - acc: 0.9986
Epoch 9/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0046 - acc: 0.9986
Epoch 10/10
892937/892937 [==============================] - 23s 26us/step - loss: 0.0044 - acc: 0.9987
446469/446469 [==============================] - 6s 12us/step
Epoch 1/10
892937/892937 [==============================] - 27s 30us/step - loss: 0.0538 - acc: 0.9826

For binarized_y, we get this:, binarized_y)

We get the epoch output as follows:

Epoch 1/10
1339406/1339406 [==============================] - 31s 23us/step - loss: 0.0422 - acc: 0.9865
Epoch 2/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0095 - acc: 0.9973
Epoch 3/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0068 - acc: 0.9981
Epoch 4/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0056 - acc: 0.9984
Epoch 5/10
1339406/1339406 [==============================] - 29s 21us/step - loss: 0.0051 - acc: 0.9986
Epoch 6/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0047 - acc: 0.9987
Epoch 7/10
1339406/1339406 [==============================] - 30s 22us/step - loss: 0.0041 - acc: 0.9988 0s - loss: 0.0041 - acc: 0.99 - ETA: 0s - loss: 0.0041 - acc: 0.998 - ETA: 0s - loss: 0.0041 - acc: 0
Epoch 8/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0039 - acc: 0.9989
Epoch 9/10
1339406/1339406 [==============================] - 29s 22us/step - loss: 0.0039 - acc: 0.9989
Epoch 10/10
1339406/1339406 [==============================] - 28s 21us/step - loss: 0.0036 - acc: 0.9990 0s - loss: 0.0036 - acc:
389185/389185 [==============================] - 3s 9us/step


We got a small bump by increasing the hidden layers. Adding some more hidden layers to learn more, we get the following:

preprocessing = Pipeline([
("scale", StandardScaler()),

pipeline = Pipeline([
("preprocessing", preprocessing),
("classifier", KerasClassifier(build_fn=network_builder, epochs=10, batch_size=128,
verbose=1, hidden_dimensions=(30,30,30,10), input_dim=74))

cross_val_score(pipeline, X, binarized_y)

The Epoch output is as shown here:

Epoch 1/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0671 - acc: 0.9709
Epoch 2/10
892937/892937 [==============================] - 21s 23us/step - loss: 0.0139 - acc: 0.9963
Epoch 3/10
892937/892937 [==============================] - 20s 22us/step - loss: 0.0100 - acc: 0.9973
Epoch 4/10
892937/892937 [==============================] - 25s 28us/step - loss: 0.0087 - acc: 0.9977
Epoch 5/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0078 - acc: 0.9979
Epoch 6/10
892937/892937 [==============================] - 21s 24us/step - loss: 0.0072 - acc: 0.9981
Epoch 7/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0069 - acc: 0.9982
Epoch 8/10
892937/892937 [==============================] - 24s 27us/step - loss: 0.0064 - acc: 0.9984

The output can be seen as follows:

array([0.97447527, 0.99417877, 0.74292446])

Execute the following command, binarized_y)
Epoch 1/10
1339406/1339406 [==============================] - 48s 36us/step - loss: 0.0666 - acc: 0.9548
Epoch 2/10
1339406/1339406 [==============================] - 108s 81us/step - loss: 0.0346 - acc: 0.9663
Epoch 3/10
1339406/1339406 [==============================] - 78s 59us/step - loss: 0.0261 - acc: 0.9732
Epoch 4/10
1339406/1339406 [==============================] - 102s 76us/step - loss: 0.0075 - acc: 0.9980
Epoch 5/10
1339406/1339406 [==============================] - 71s 53us/step - loss: 0.0066 - acc: 0.9983
Epoch 6/10
1339406/1339406 [==============================] - 111s 83us/step - loss: 0.0059 - acc: 0.9985
Epoch 7/10
1339406/1339406 [==============================] - 98s 73us/step - loss: 0.0055 - acc: 0.9986
Epoch 8/10
1339406/1339406 [==============================] - 93s 70us/step - loss: 0.0052 - acc: 0.9987
Epoch 9/10
1339406/1339406 [==============================] - 88s 66us/step - loss: 0.0051 - acc: 0.9988
Epoch 10/10
1339406/1339406 [==============================] - 87s 65us/step - loss: 0.0049 - acc: 0.9988
389185/389185 [==============================] - 16s 41us/step

By executing the preceding code we will get the following ouput:


The best result so far comes from using deep learning. However, deep learning isn't the best choice for all datasets.

