Chapter Outline
Key Learning Points
Figure 5.1 Security AI process
Security AI Process
The artificial intelligence (AI) system being developed uses the following design approach:
See Figure 5.1. The necessary process is shown here.
Discover Phase
Detect Phase
Threat Score
Threat Response
Protect
Monitor
This process helps companies address vulnerabilities and neutralize threats proactively instead of waiting for hackers to exploit high-security vulnerabilities.
The following sections will go through the AI process in detail.
Discover AI
Define Goal
Data Collection
Input:
Output (label):
Table 5.1 Assets in the organization
Device |
IP Address |
Type |
Model |
MacbookPro |
192.168.2.100 |
Apple Mac |
Apple MacBook Pro 2013 |
JohnDesktop |
192.168.2.101 |
Computer |
Windows XP |
First Floor |
192.168.2.102 |
Printer |
HP Officejet 4650 |
Conference Room 1 |
192.168.2.103 |
Router |
Netgear Nighthawk R7000 |
Table 5.2 Network traffic log
Event Time |
Connection |
Source |
Dest |
Data |
Bytes Uploaded (bytes) |
Bytes |
Flow |
Thu Sep 24 09:00 |
123123132 |
192.168.2. |
192.168.2.101: |
23423423 |
2342234234 |
32423434 |
1029 |
Table 5.3 Network metadata extracted features
Event Time |
Connection ID |
No. of Security Keywords |
Min. |
Max. Byte Value |
No. of Distinct Bytes |
No. of Nonprint Chars |
No. of Punctuation Chars |
Thu Sep |
123123132 |
3 |
4 |
9 |
100 |
3 |
5 |
Table 5.4 Network metadata additional extracted features
Event |
Connection ID |
Contains Shell Code |
Contains JavaScript Code |
Contains SQL Command |
Contains Command Injection |
Thu Sep |
123123132 |
Y |
N |
N |
N |
Table 5.5 Network metadata shellcode extracted features
Event Time |
Connection ID |
API Call Sequence |
Thu Sep 24 09:00 |
123123132 |
1 |
Thu Sep 24 09:15 |
123123132 |
2 |
Table 5.6 Security events
Event Time |
Event |
Description |
I/S |
Source IP |
Thu Sep 24 09:00 |
SV1 |
Local logon success |
S |
192.168.2.100 |
Thu Sep 24 09:15 |
SV2 |
Local logon fail |
I |
192.168.2.101 |
Thu Sep 24 09:15 |
SV3 |
Network traffic overload |
I |
192.168.2.101 |
Thu Sep 24 09:15 |
SV4 |
CPU usage overload |
I |
192.168.2.101 |
Thu Sep 24 09:15 |
SV5 |
Disk usage overload |
I |
192.168.2.101 |
ML/AI use case:
This is a binary classification problem for list of all assets.
Design Algorithm
Binary classifiers can be used for this model.
Identify Features
From the above datasets, 11 of the most important and generic features are selected out of all features to train ML and AI models to recognize the vulnerable assets.
Train the Model
Train the selected algorithms using the prepared dataset.
Tuning the hyperparameters to train the model.
Run the training multiple times with different combinations of the provided hyperparameters of batch size, epochs, optimizer, learn rate, momentum, and dropout rate to find the optimum combination of hyperparameters to determine the appropriate results.
It is recommended to try out different combinations of hyperparameters using grid search.
Grid search = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=−1, scoring=’accuracy’)
Compare models with different hyperparameters and choose the best fit. Use the model to train further using the full dataset.
Test the Model
Test the model using the test dataset for each selected algorithm with given methods:
See Figure 5.2. Pick the model with the “Learning rate- Good,” depicted in the graph.
Figure 5.2 Loss versus epoch learning graph
Receiver Operating Characteristic Curve
Receiver operating characteristic curve is another common tool used with binary classifiers. See Figure 5.3. For this use case, let’s choose the model with false positive rate of 0.99, since 90 percent accuracy is expected from Discover AI Model.
Figure 5.3 ROC curve
Evaluate the Model
Evaluate the model using accuracy and the mean square error (MSE) to determine the learning rate. The following scoring methods are
used.
Scoring Methods
Confusion Matrix
Repeat the steps from data collection, data preparation, feature extraction, training, testing, and evaluation of the model until the necessary accuracy of 85 and above is reached.
Table 5.7 shows the classification accuracy output obtained using several ML algorithms.
Table 5.7 Classification algorithm and accuracy
Naïve Bayer |
81.5% |
SVM |
88.5% |
J48 |
91% |
Bayesian network |
94% |
Multinomial logistic regression |
96.5% |
FT (classifier for building “Functional Tree,” classification trees that could have logistic regression functions at the inner nodes) |
99.5% |
SMO (implements John Platt’s sequential minimal optimization algorithm for training a support vector classifier) |
99.8% |
Multilayer perceptron |
100% |
Model Conclusion
Based on the classification accuracy, multilayer perceptron is the best performing model for this binary classification problem.
Publish and Production of the Model
Conclusion
Type of resources needed for this project are as follows:
To simplify all the above steps, check out www.BizStats.AI
The automated steps are as follows:
Detect AI
Define Goal
Step 1: Detect the threats that have happened, are happening, or are going to happen
Step 2: Categorize the threat
Data Collection
Input the following (Table 5.8):
Table 5.8 Malwa attack data
Phishing URL |
Targeted Brand |
Time (UTC) |
http://some…site.url/https.bb.de.jb/index1.html |
Generic/Spear Phishing |
17:43:24 |
http://some…site.url/https.bb.de.jb/index2.html |
Generic/Spear Phishing |
17:47:24 |
http://some…site.url/https.bb.de.jb |
Generic/Spear Phishing |
17:43:24 |
Output:
Design Algorithm
Identify Features
From the input datasets, the following features have been selected for this model:
Step 1: Detect the threat.
See Table 5.9.
Table 5.9 Detect the threat data sample
ID |
Date |
User |
PC (mac address) |
Event Type |
Acti-vity |
From |
To |
Size |
Attach-ments |
Con-tent |
Threat |
12w21 |
Thu Sep 24 09:00 |
john |
sdf-dfdf-df-dfd-fd |
|
Sent |
User |
To, cc, bcc |
10 MB |
Yes |
E-mail |
Y |
w12d2 |
Thu Sep 24 09:00 |
smith |
sdf-dfdf-df-dfd-fd |
File |
Download |
External device |
Dest IP/PC |
100 MB |
No |
File |
Y |
wewd23 |
Thu Sep 24 09:00 |
chris |
sdf-dfdf-df-dfd-fd |
Browser |
Http visit |
URL |
Dest IP/PC |
2 MB |
No |
Page |
N |
2dgfg34 |
Thu Sep 24 09:00 |
jennifer |
sdf-dfdf-df-dfd-fd |
User Security |
Logon |
- |
Dest IP/PC |
- |
No |
- |
N |
32rffedc |
Thu Sep 24 09:00 |
jack |
sdf-dfdf-df-dfd-fd |
Device |
Connected |
External |
Dest IP/PC |
10 GB |
No |
File tree |
Y |
Step 2: Categorize the threat. Only the threats are passed to this step.
See Table 5.10.
Table 5.10 Categorize the threat data sample
ID |
Date |
User |
PC |
Event Type |
Acti |
From |
To |
Size |
Attach |
Content |
Threat |
Threat Category |
12w21 |
Thu Sep 24 09:00 |
john |
sdf-dfdf-df-dfd-fd |
|
Sent |
User |
To, cc, bcc |
10 MB |
Yes |
E-mail |
Y |
Insider |
w12d2 |
Thu Sep 24 09:00 |
smith |
sdf-dfdf-df-dfd-fd |
Network |
Files |
URL |
Dest IP/PC |
100 MB |
No |
File |
Y |
Application threat |
32rffedc |
Thu Sep 24 09:00 |
jack |
sdf-dfdf-df-dfd-fd |
Device |
Connected |
External |
Dest IP/PC |
10 GB |
No |
File |
Y |
User |
Identify List of Algorithms
Train the Model
Step 1: Detect the threat
This step is the same as the “Discover AI” step.
Step 2: Categorize the threat
Use the identified datasets to train the model.
Tune the Model
Tune the hyperparameters to train the model.
Run the training multiple times with different combinations of the hyperparameters of selected algorithms. This typically includes batch size, epochs, optimizer, learn rate, momentum, and dropout rate to find the optimum combination of hyperparameters to determine the appropriate results.
Compare models with different hyperparameters and choose the best fit. Then, use the model to train using the full dataset.
Test the Model
Step 1 is the same as the “Discover AI” step.
Step 2 is the same as the “Discover AI” step, except it uses a multiclass classification.
Evaluate the Model
Step 1 is the same as the “Discover AI” step.
Step 2: Categorize the threat
Evaluate the model using accuracy, MSE, and determine the learning rate.
Use Confusion Matrix
Confusion matrix measures the performance of the classification algorithms. See Figure 5.4.
Figure 5.4 Confusion matrix predicted versus targeted
Confusion matrix graph for multiclass classification:
Use encoded values instead of actual security categories.
Precision–Recall Curves
Precision–recall curves measure the success of the classification model.
See Figure 5.5.
Figure 5.5 Precision and recall versus K
Check mistakes of the classification model on each label and analyze to fine-tune (see Figure 5.6).
Figure 5.6 Accuracy versus epoch graph
Decide the final model with accuracy
The ensemble learning methods are as follows:
Based on our experiments, the finalized ensemble classifier method is “weighted majority rule ensemble classifier.” This provides better accuracy than any other ensemble methods.
Weighted Majority Rule Ensemble Classifier
Three different classification models can be used to classify the samples: logistic regression, a naive Bayes classifier with a Gaussian kernel, and a random forest classifier all combined into an ensemble method.
The cross-validation results show that the performance of the three models is almost equal.
A simple ensemble classifier class can be implemented to combine the three different classifiers. A predict method simply takes the majority rule of the predictions by the classifiers. Here are some examples:
Now, classify the sample as “class 1.”
Furthermore, adding a weights parameter assigns a specific weight to each classifier. To work with the weights, collect the predicted class probabilities for each classifier, multiply it by the classifier weight, and take the average based on these weighted average probabilities. Next, one must assign the class label.
To illustrate this with a simple example, let us assume to have three classifiers and three-class classification problems to assign equal weights to all classifiers (the default): w1 = 1, w2 = 1, w3 = 1.
The weighted average probabilities for a sample would be calculated as follows (see Table 5.11).
Table 5.11 Weighted average calculation
Classifier Name |
Class 1 |
Class 2 |
Class 3 |
Classifier 1 |
W1 * 0.2 |
W1 * 0.5 |
W1 * 0.3 |
Classifier 2 |
W1 * 0.6 |
W1 * 0.3 |
W1 * 0.1 |
Classifier 3 |
W1 * 0.3 |
W1 * 0.4 |
W1 * 0.3 |
Weighted average |
0.37 |
0.4 |
0.3 |
The class 2 has the highest weighted average probability; thus, the sample is classified as class 2.
The ensemble classifier class is used to apply to majority voting using the class labels if no weights are provided and the predicted probability values show otherwise (see Table 5.12).
Table 5.12 Prediction based on majority class label
Classifier Name |
Class 1 |
Class 2 |
Classifier 1 |
1 |
0 |
Classifier 1 |
0 |
1 |
Classifier 1 |
0 |
1 |
Prediction |
- |
1 |
Prediction based on predicted probabilities:
This is for equal weights: weights = [1, 1, 1]. See Table 5.13.
Table 5.13 Weighted average with prediction
Classifier |
Class 1 |
Class 2 |
Classifier 1 |
0.99 |
0.01 |
Classifier 2 |
0.49 |
0.51 |
Classifier 3 |
0.49 |
0.51 |
Weighted average |
0.66 |
0.18 |
Prediction |
1 |
- |
The results are different depending on whether a majority vote based on the class labels or the average of the predicted probabilities is applied. In general, it makes more sense to use the predicted probabilities (scenario 2). Here, “very confident” classifier 1 overrules the very unconfident classifiers 2 and 3.
A naive brute-force approach should be used to find the optimal weights for each classifier to increase the prediction accuracy.
The results look nice when applying the ensemble classifier (see Table 5.14). One must keep in mind that this is just a toy example. The majority rule voting approach might not always work so well in practice, especially if the ensemble consists of more “weak” than “strong” classification models. This uses a cross-validation approach to overcome the overfitting challenge, please always keep a spare validation dataset to evaluate the results.
Table 5.14 Example of compare table
Iteration |
W1 |
W2 |
W3 |
Mean |
Standard Deviation |
2 |
1 |
2 |
2 |
0.953333 |
0.033993 |
17 |
3 |
1 |
2 |
0.953333 |
0.033993 |
20 |
3 |
2 |
2 |
0.946667 |
0.045216 |
Model Conclusion
Step 1 is the same as the “Discover AI” step.
Step 2: Categorize the threat.
Based on our experiment, the “weighted majority rule ensemble classifier” performs better than other algorithms for the “Detect AI: Categorize the threat model.”
Publishing and Production of Models
Step 1 is the same as the “Discover AI” step.
Step 2 is the same as the “Discover AI” step.
Conclusion
Step 1 is the same as the “Discover AI” step.
Step 2 is the same as the “Discover AI” step.
Threat Score AI
Define Goal
Score the identified threats for prioritization of issues for further investigation.
Data Collection
Input:
Output:
ML and AI use case:
This is a regression problem.
Design Algorithm
Regression algorithms can be used for this model.
Identify Features
See Table 5.15.
Table 5.15 Threat score data sample
Event ID |
Threat Type |
Threat Impact |
Likelihood |
Number |
Level of Inter-connected-ness |
Does it |
Size of User Base |
Threat Score |
wewd23 |
Hacking |
5M |
20 |
3k |
20 |
Y |
500 |
5 |
12w21 |
Data |
24M |
10 |
10k |
15 |
Y |
5M |
8 |
w12d2 |
Cross-site scripting |
2M |
50 |
4k |
25 |
N |
100 |
3 |
Train the Model
Using Deep Neural Network Architecture
The deep neural network can be trained to predict the threat score (see Figure 5.7).
Figure 5.7 Fully connected neural network layer for deep neural networks
Fully connected neural network layers consist of a list of inputs into a list of outputs. Three basic types of layers exist: input layers, output layers, and hidden layers. The functionality-based layers are convolution layers, maximum and average pooling layers, dropout layer, nonlinearity layer, loss function layers, and much more (see Figure 5.8).
Figure 5.8 Activation function of neural network
This deep neural network consists of
See Table 5.16.
Table 5.16 Deep neural network layers
Layer (type) |
Output Shape |
Param # |
Input layer (Dense) |
(none, 128) |
19,200 |
Hidden layer 1 (Dense) |
(none, 256) |
33,024 |
Hidden layer 1 (Dense) |
(none, 256) |
65,792 |
Hidden layer 1 (Dense) |
(none, 256) |
65,792 |
Output layer (Dense) |
(none, 1) |
257 |
Total params: 184,065; trainable params: 184,065; non-trainable params: 0. |
The validation loss of the best model is 18520.23 (see Figure 5.9).
Figure 5.9 Training log output
Using Other Algorithms
Use the identified datasets to train the model.
Fit a model using linear regression first and then determine whether the linear model provides an adequate fit by checking the residual plots. If a good fit cannot be obtained using linear regression, try a nonlinear model because it can fit a wider variety of curves. It is recommended to use ordinary least squares (OLS) first because OLS is easier to perform and interpret.
Use Akaike’s information criterion (AIC), Bayesian information criterion (BIC) (see Figure 5.10), or Mallows’ CP to decide how many factors should be used. These methods are better than comparing with R2.
Figure 5.10 Principal components versus Bayesian information criterion
Use multiple regression. If the sample size is large enough, you may use autoselect option such as forward backward or best. This option will select independent factors using sampling techniques.
Test the Model
Test the model using the test dataset and expert knowledge.
Analyzing different metrics like statistical significance of parameters, R-square, adjusted r-square, and AIC, model is considered to be more likely to be the true model (BIC) and error term. Another one is the Mallow’s Cp criterion. This checks for possible bias in your model by comparing the model with all possible submodels (or a careful selection of them). See Figure 5.11.
The model is more accurate for the left graph.
You should not choose automatic model selection methods if your dataset has multiple confounding variables because you do not want to put these in a model at the same time. Regression regularization methods work well in case of high dimensionality and multicollinearity among the variables in the dataset.
Figure 5.11 R-squared comparison graph
Evaluate the Model
Evaluate the model using accuracy and MSE to determine the learning rate. Cross-validation is the best way to evaluate models used for prediction. Cross-validation divides your dataset into two groups (train and validate). A simple mean squared difference between the observed and predicted values gives you a measure for the prediction accuracy. See Figure 5.12.
Predicting threat score model using linear model:
Figure 5.12 Cross-validation comparison
See Figure 5.13 for the predicted versus the actual and root mean square error (RMSE) score below.
Figure 5.13 Evaluate ensemble model predicted versus actual
Test RMSE score: 5.125877.
The threat score model is predicted using a bagged model. The bagged model is fitted using the randomForest algorithm. Bagging is a special case of a random forest where mtry (number of variables randomly sampled as candidates at each split) is equal to p, the number of predictors. Try using 13 predictors. Test RMSE score: 3.843966.
The model provides two interesting results. First, the predicted versus the actual plot no longer has a small number of predicted values. Second, our test error has dropped dramatically. Also note that the “mean of squared residuals,” which is output by randomForest, is the out of bag estimate of the error. For regression, the suggestion is to use mtry equal to p/3 = 4 predictors. Test RMSE score: 3.701805.
Here note the three RMSEs: the training RMSE (which is optimistic), the out-of-bag (OOB) RMSE (which is a reasonable estimate of the test error), and the test RMSE. Also note that the variables’ importance was calculated. See Table 5.17.
Table 5.17 Data versus error comparison
No |
Data |
Error |
1 |
Training Data |
1.558111 |
2 |
OOB |
3.576229 |
3 |
Test |
3.701805 |
Lastly, try using a boosted model. This model produces a nice variable importance plot and plots the marginal effects of the predictors. Based on this analysis, decide which variables most influence the threat score. See Table 5.18.
Table 5.18 Test error score of each model
Model |
Test Error |
|
1 |
Single Tree |
5.45808 |
2 |
Linear Model |
5.12587 |
3 |
Bagging |
3.84396 |
4 |
Random Forest |
3.70180 |
5 |
Boosting |
3.43382 |
Model Conclusion
Ensemble boosting model performed better than other algorithms for predicting threat score models.
Publishing and Production of Model
This is the same as the “Discover AI” step.
Conclusion
The same ML and AI model can be used for scoring the skills mentioned below.
Skills AI scoring models:
The skills scoring models can be used with customers, customer support executives, project resources, vendors, suppliers, buyers, and much more.
Threat Response AI
Define Goal
Predict what response needs to be taken to fix the threat.
Data Collection
Input:
Output:
ML and AI Use case:
This is a multiclass classification problem
Design Algorithm
This is the same as the “Detect AI: Categorize the threat” step.
Identify Features
This is the same as the “Detect AI: Categorize the threat” step.
Train the Model
This is the same as the “Detect AI: Categorize the threat” step.
Test the Model
This is the same as the “Detect AI: Categorize the threat” step.
Evaluate the Model
This is the same as the “Detect AI: Categorize the threat” step.
Model Conclusion
This is the same as the “Detect AI: Categorize the threat” step.
Publish/Production of Model
This is the same as the “Detect AI: Categorize the threat” step.
Conclusion
This is the same as the “Detect AI: Categorize the threat” step.
Protect AI
Define Goals
Data Collection
Input:
Output:
Design Algorithm
The following algorithms can be used for this model:
Identify Features
Prepare data from the available input dataset and from the “Threat Response AI” step.
Train the Model
This is the same as the “Detect AI: Categorize the threat” step.
Test the Model
This is the same as the “Detect AI: Categorize the threat” step.
Evaluate the Model
This is the same as the “Detect AI: Categorize the threat” step.
Model Conclusion
This is the same as the “Detect AI: Categorize the threat” step.
Publish/Production of Model
This is the same as the “Detect AI: Categorize the threat” step.
Conclusion
This is the same as the “Detect AI: Categorize the threat” step.
Monitor AI
Define Goals
Monitor in Discover Phase
This is done using dashboards and scorecards.
Monitor in Detect Phase
This is done using dashboards and scorecards.
Monitor in Threat Scoring Phase
This is done using dashboards and scorecards.
Monitor in Threat Response Phase
This is done using dashboards and scorecards.
Monitor in Protect Phase
This is done using dashboards and scorecards.
Firms in AI Cybersecurity
The development of AI techniques has seen AI appear in a lot of different IT products, including the field of cybersecurity. Here are some key innovators in cybersecurity that are using AI to give their products an edge (Cooper 2019).
AI is becoming popular throughout the world and is being included in many software applications, especially in cybersecurity products.
The following organizations are spearheading the initiatives with AI cybersecurity applications:
Darktrace: When initially installed, Darktrace creates a baseline as a protection point. From the baseline, any unusual occurrence on the network is treated as a threat. The detection of an anomaly triggers an automated response, which also relies on AI technology.
Cynet: This application is installed on the network to provide accessible threat protection to organizations that do not have specialist cybersecurity personnel.
FireEye: This company produces cybersecurity tools that use AI to monitor networks and spot anomalies. FireEye manages detection response and incident response.
Check Point: Check Point is invested in the development of three AI-driven platforms that contribute to many key offerings in business. These offerings are Campaign Hunting, Huntress, and Context-Aware Detection.
Symantec: This is a targeted attack analytics tool. The innovative deployment of AI by the company makes it a good blend of capital-increasing security and savings-defending stability.
Sophos: The two main AI-based Sophos products are Intercept X for endpoint protection and the XG Firewall to protect networks. Intercept X uses AI to avoid the need for a threat database distributed from a central location. It uses a deep learning neural network developed by
Invincea.
Fortinet: One of Fortinet’s products is FortiGate, a hardware firewall. This application is a workflow that includes endpoint protection, access protection, application monitoring (such as e-mail and web security), and advanced threat protection.
Cylance: All of Cylance’s products integrate AI technology. Its main packages are:
Vectra: This is a threat detection system that deploys AI methodologies to establish a baseline of activity throughout an enterprise and identify anomalies. See Figure 5.14 to view possible cyberattacks that are known currently.
Figure 5.14 Cyber threat taxonomy
See Figure 5.15.
Figure 5.15 Cybersecurity types that are known
Asset Discovery
Asset discovery will find and provide you visibility into the assets in your AWS, Azure, and on-premises environments. You will be able to discover all the IP-enabled devices on your network, determining what software and services are installed on them, how they are configured, and whether there are any vulnerabilities or active threats being executed against them.
What instances are running in my Cloud environments?
What devices are on my physical and virtual networks?
What vulnerabilities exist on the assets in my Cloud and network?
What are my users doing?
Are there known attackers trying to interact with my Cloud and network assets?
Are there active threats on my Cloud and network assets?
Intrusion Detection
Intrusion detection detects any intrusion threats as they emerge in your critical Cloud and on-premises infrastructure.
Endpoint Detection and Response
Corporate endpoints represent one of the top areas of security for organizations. As malicious actors increasingly design their attacks to evade traditional endpoint prevention and protection tools, organizations are looking to endpoint detection and response for additional visibility, including evidence of attacks that might not trigger prevention rules. While many security teams recognize the need for advanced threat detection for endpoints, most do not have the resources to manage a standalone endpoint detection and response solution.
Threat Detection
Threat Intelligence
Vulnerability Assessment
See Figure 5.16.
See Figure 5.17.
Figure 5.16 Potential transformation in cybersecurity that are applying AI use cases
Figure 5.17 Computer security using ML that explains how security domains maps to ML domains
Goal:
Detect insider threats: actions taken by an employee that are harmful to an organization:
Unsanctioned data transfer.
Sabotage of resources.
Misuse of network that disrupts the organization.
Input:
File event: event identification, file name, file type, action (copied), and size.
E-mail: event identification, date, user, recipients, activity (send), size, attachments, and content.
Http event: event identification, date, user, and URL.