Chapter 27

Counter Cyber Attacks By Semantic Networks

Peng He,    University of Maryland, Baltimore, MD, USA

In order to increase the accuracy of intrusion detection rate and reduce the false alarm rate for cyber-security analysis, attack correlation has become an indispensable component in most intrusion detection systems. However, traditional intrusion detection techniques often fail to handle the complex and uncertain network attack correlation tasks. We propose the creation of semantic networks that build relationships among network attacks and assist in automatically identifying and predicting related attacks. Also, our method can increase the precision in detecting probable attacks. Experimental results show that our semantic network, using the Anderberg similarity measure, performs better in terms of precision and recall compared to existing correlation approaches in the cyber-security domain.

Keywords

cyber threats; semantic networks; similarity measures; Bayesian probability

Information in this chapter

• Cyber threats

• Semantic networks

• Similarity measures

• Bayesian probability

Introduction

Nowadays, intrusion detection is one of the most challenging tasks for, and of highest priority in, the cyber-security field. As more and more security sensors are being deployed in the network and used to analyze and detect attacks, these sensors generate a huge volume of alerts with different event granularities and semantics [1]. Such a huge amount of alerts makes a network attack correlation process quite complex and uncertain. On the other hand, attack correlation has become an essential part in most intrusion detection systems (IDSs), since it can enhance the detection rate and provide more accurate attack strategies [2]. Thus, a better technique for attack analysis and correlation is vital for promoting the current network security.

We propose to identify and predict relevant attacks by using semantic networks. In the creation of a semantic network, each node represents an attack and the edges connect relevant attacks.

Specifically, our contributions are as follows: (1) We automatically construct a first mode semantic network from characterizing features of network attacks using similarity. (2) The first mode semantic network is kept adjusted by adding external semantic rules provided by domain expertise that could adjust it, in order to generate a more adaptable second mode semantic network. (3) We have applied several similarity measures, including Anderberg, Jaccard, Simple Matching, and a traditional correlation coefficient to create semantic networks. (4) Finally, we evaluated, through experiments, the various similarity measures and discovered that using the similarity coefficient Anderberg performs better in terms of precision and recall compared to existing correlation approaches in the cyber-security domain.

The rest of the chapter is organized as follows: The following section describes related work. ”Methodology” outlines our approach using two modes of semantic networks. ”Experiments” presents our experiments. The last section concludes the chapter.

Related work

In this section, related works on Attack Correlation, Semantic Networks, and the Bayesian probability model will be discussed separately.

Related work on attack correlation

In a situation where there are intensive attacks, not only will actual alerts be mixed with false alerts, but the amount of alerts will also become unmanageable [3]. The actual experience of intrusion detection practitioners indicates, “Encountering 10–20,000 alarms per sensor per day is common” [4]. Therefore, it is challenging to analyze intrusion alerts without the help of an alert correlation process, particularly due to the large amount of alerts produced by IDSs. Some previous methods are limited, in that they are restricted to known attack scenarios or those that can be generalized from known scenarios. While the authors in [3] propose to correlate the alerts generated by IDSs using prerequisites and consequences of the corresponding attacks, intuitively, the prerequisite of an attack is the necessary condition for the attack to be successful. The results show that their correlation method not only correlates related alerts and uncovers the attack strategies, but also provides a way to differentiate between alerts.

Other research suggests using appropriate attack correlation techniques to handle large collections of alerts, such as in [1], where the authors have developed a two-layered PA-based (primitive attack-based) correlation approach to tackle the problem. The first layer does PA construction by integrating related alerts into proper PAs. The second layer is the attack subplan-based correlation layer, which attacks a scenario correlation from recognized PAs by employing attack subplan templates to guide the correlation process. In [5], the authors indicate that alert correlation techniques effectively improve the quality of alerts reported by intrusion detection systems and are sufficient to support rapid identification of ongoing attacks. The research focuses on ways to develop the intrusion alerts correlation system according to the authors’ XSWRL ontology-based alert correlation approach.

Related work on semantic networks

A semantic network or net is a graphic notation for representing knowledge in patterns of interconnected nodes and arcs. Computer implementations of semantic networks were first developed for artificial intelligence and machine translation, but earlier versions have long been used in philosophy, psychology, and linguistics. Sowa gives a descriptive outline on the types and use of semantic networks in different disciplines in [6,7]. Semantic networks have long been used to represent relationships [8]. Pearl used probabilities in semantic networks and performed extensive work in applying statistics and probability in causal semantic networks [9,10] to derive such networks from observed data. What is common to all semantic networks is a declarative graphic representation that can be used to either represent knowledge or support automated systems for reasoning about knowledge. Some versions are highly informal, while other versions are formally defined systems of logic.

There are some applications on applying semantic networks in different domains. The work in [11] describes a new metadata approach to elicit semantic information from environmental data and implement semantics-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Another paper [12] has applied semantics in the domain of software engineering and provided methods to discover relevant software artifacts to increase software reuse and reduce the cost of software development and maintenance. It proposes a metadata approach with Semantic Networks that convey existing relationships between software artifacts. This approach reveals additional relevant artifacts that the user might have not been aware of. In this chapter, we discuss how to apply semantic networks in identifying related network attacks in the cyber-security domain and how it is used to increase the precision in detecting probable attacks with probability of occurrence.

Related work on the Bayesian probability model

Bayesian networks have been established as a ubiquitous tool for modeling and reasoning under uncertainty [13], and there are several existing applications in which the Bayesian probability model has been applied to intrusion detection systems [1416].

One recent paper [17] presents work on justifying uncertainty modeling for cyber-security, with initial evidence indicating that it is a useful approach. The authors report their current efforts on identifying the important types of uncertainty and on using Bayesian networks to capture them for enhanced security analysis. They also build an example Bayesian network based on a current security graph model and justify their modeling approach through attack semantics. Experimental study shows that the resulting Bayesian network is not sensitive to parameter perturbation. Their work serves as a good foundation for us to apply the Bayesian network as our probability model on a cyber-security dataset, and use it to find out initial attacks located in our semantic network with a high probability of occurrence. In other words, the Bayesian probability model is used to predict the starting node in our semantic network, and then we can identify other relevant attacks with a high probability of occurrence in the current network situation.

Methodology

To avoid the intricate task of manually constructing and maintaining the semantic network, we adopt an approach for the construction of semantic networks in an automatic manner The construction consists of two layers: first mode and second mode networks. The first mode network identifies relevant attacks based on similarity measures, and the second mode network is modified based on the first mode and adjusts it by adding domain expertise, as shown in Figure 27.1.

image

Figure 27.1 Constructing a semantic network.

Similarity-based semantic network

Let X={x1,....xn} be the set of network attacks, where each xiimage X is associated with a set of characterizing attributes a1={ai1,.... aim}. These attributes may be mixed with numeric values and categorical values. By applying discretization, the numeric values of these features can be transformed into categorical values (binary) and form a feature vector f1={fi1,.... fim}. In order to automatically create a similarity-based network, we first need to generate the feature vector associated with each attack. These feature vectors are used to determine how similar the attacks are in terms of the attributes characterizing them; they are then utilized to generate the semantic network. Especially in this research, the weighted frequency feature values in attack feature vectors are used to prepare for the binary feature vector, and the absolute cutoff data transformation [18] has been used to convert weighted frequency feature values into binary values of 0 and 1. Since each attack has more than one attribute that formed as one feature vector, in order to determine the similarity between attacks by taking consideration of different attributes, the union of all feature vectors creates a universal vector (V) that containing all attack attributes, which is V= {ai1image a21,...image aim}. And then V will be used for the similarity-based semantic network creation.

We then use different similarity coefficients to quantify the similarity among universal feature vectors of the attacks. Based on the similarity coefficients, we connect similar nodes using edges and start creating the first mode semantic network.

Given a pair of nodes xp and xq, such that there exists a similarity between the two nodes, the probability wpq of traversing from node xp to xq is:

image

Spq is one of the similarity coefficients between the feature vectors of attacks (nodes) xp and xq. Spj is the weighted degree of the node p, and k is the number of incident edges on p. Thus, based on the similarity and probability computations, we automatically construct a first mode semantic network, as shown in Figure 27.1(a), which we refer to as a Similarity Based Semantic Network.

Below, we formally define the first mode semantic network as follows:

Definition 1 [Similarity Based Semantic Network]: Let X={x1,... xn} be the set of attacks, where each xiimage X has a feature vector f1={fi1,…. fim}, then a first mode similarity based network Nsn(Vsn, Esn) is a directed graph where Vsn is a set of nodes and Esn is a set of edges, such that Vsnimage X and |Vsn| image |X|, and each edge links two relevant attacks <vi, vj> and has a probability score w(vi,vj) where 0<w(vi,vj)image1.Definition 2 [Similarity Coefficient]: Similarity coefficients serve as an effective tool for measuring the similarity among objects in a dataset. In [19], the authors surveyed 35 different coefficients composed of four variables: a, b, c, and d. When objects oi and oj are evaluated for similarity, each object is associated with features f1, …, fn having values 0 or 1. The concept is illustrated in Figure 27.2.

image

Figure 27.2 Objects associated with feature values in similarity coefficient.

In Figure 27.2, (a) represents the number of positive matches such that oi and oj both have a value of 1, (b) indicates the number of mismatches such that oi has a value of 1 and oj has a value of 0, (c) represents the number of mismatches such that oi has a value of 0 and oj has a value of 1, and (d) represents the number of negative matches such as oi and oj both have a value of 0.

Based on the empirical evaluation results from [19], we select three similarity coefficients that have high accuracy, precision, and recall: Anderberg, Jaccard/Tanimoto, and Simple Matching. These are used in creating different similarity based semantic networks in order to compare the performance of the similarity coefficients methods applied in cyber-security field. Their calculation formulas are defined as following (the symbols a, b, c, and d refer to those described in Figure 27.2):

Anderberg: image

Jaccard/Tanimoto: image

Simple Matching: image

Pearson’s correlation coefficient is also chosen to create another kind of Semantic Networks in our research, since it has been widely used in the cyber-security domain. Pearson’s correlation coefficient between two objects is defined as the covariance of the two objects “cov (X, Y)” divided by the product of their standard deviations “σX•σY.” The formula is shown as:

image

We next describe how to generate a relevance score “rs,” based on one of above similarity coefficients.

Definition 3 [Relevance Score]: If vi and vj are two nodes in a semantic network N (E, V), there are k paths p1,…, pk between vi and vj, where path pl (1imagel image k) consists of nodes vl1,…, vl|pl|+1 (|pl| is the length of path pl). The relevance score rs, defined between vi and vj, is

image

This formula computes the relevance score between vi and vj as the maximum relevance score of all paths connecting vi and vj. For each such path, the relevance score between the two endpoints is computed as the product of relevance scores for all edges along the path.

Here we give a concrete example to explain how we generate relevance score “rs” based on one of similarity coefficients, “Jaccard.” Suppose we have six objects, and according to Jaccard’s coefficient calculation, we get pair-wise similarity scores stored, as shown in Table 27.1. One example pair of the objects’ relevance score in a semantic network computed between object 1 and object 2 is the maximum relevance score of all paths connecting object 1 and object 2. Thus, a relevance score of 0.81 by object 1 and object 2 in Table 27.2 indicates that rs (1, 2)=rs (1, 6) * rs (6, 2)=0.9 * 0.9=0.81.

Table 27.1

Pair-Wise Jaccard Similarity Scores

Node 1 Node 2 Similarity Score
1 6 0.9
2 5 0.2
3 4 0.2
4 5 0.2
6 2 0.9
6 3 0.2

Table 27.2

First Mode Semantic Network

Node 1 Node 2 Relevance Score
1 2 0.81
1 3 0.18
1 5 0.162
1 6 0.9
2 5 0.2
3 4 0.2
4 5 0.2
6 2 0.9
6 3 0.2

Rule enforced semantic network

The automatically created first mode semantic networks—by using only similarity measures—do not include any of the semantic information that domain experts usually expect. Particularly in the cyber-security domain, this type of additional semantic information can be described as semantic rules. Semantic rules explicitly identify the connectivity relationship between two network attacks, which can be extracted from domain knowledge and represented as taxonomy and ontology. Some connectivity relationships described by semantic rules among attack nodes may not be the same as the results by similarity measures in the first mode semantic network, so we use semantic rules to adjust our previous attack correlation results to generate a rule-enforced semantic network.

Definition 4 [Semantic Rule]: Given two attack nodes xp and xq, a semantic rule “s” is defined as: xp, xq, where spq is the semantic score associated with these two attacks, extracted from domain expertise such as taxonomy. If two attack nodes xp and xq fall into same category of network attacks defined by taxonomy, we set the value of spq equal to 1. Otherwise, when xp and xq do not belong to the same attack category, the value of spq is set to 0.

Next, spq will be used to adjust the previous relevance score rspq given by a predefined threshold σ from domain experts. When the absolute value of | spq-rspq |>σ, it indicates that rspq needs to be updated by semantic rules, and the degree of the adjustment noted as ad_Degree, below, will also be customized by domain experts in order to reflect the most appropriate relevance score among attacks in the current network environment. Hence, there are two possible ways to update the previous relevance score:

(a) If the value of (spq-rspq)>0, the updated relevance score is RS’pq=rspq+|spq-rspq|× ad_Degree.

(b) If the value of (spq-rspq)<0, the updated relevance score is RS’pq=rspq–|spq-rspq|× ad_Degree.

Experiments

Experiment data

We selected the KDD CUP 99 data set [20] that was made available at the Third International Knowledge Discovery and Data Mining Tools Competition. This training dataset was originally prepared and managed by MIT Lincoln Labs [21], and the objective was to survey and evaluate research in intrusion detection. There are 494,021 network connection events in this dataset and among them, 75 percent of the data is used to build our semantic network and train our Bayesian probability model, and the other 25 percent of the data is used to evaluate the performance of our approach. In addition, a set of 41 different features is used to decide whether a selected sequence of events is an attack or a normal behavior. Twenty-three attack types were used in this data set. Among these network connection records, 20 percent of them represent normal patterns.

We next give some sample sequences of network connection events in the dataset we used in our research, as shown in Table 27.3:

Table 27.3

Sample Sequences of Network Connection Events

Image

The seven columns represent features that describe these network connection events. We only show a random seven features out of a total 41 features in the entire dataset. The labels in the last column indicate whether the sequence of events is a normal behavior or a kind of network attack, such as Guess_password or Load Module. These labels are also used to verify the performance of our four different similarity-based semantic networks.

Experiment process

We create four different semantic networks using the same network connection dataset through a two-step approach mentioned in the section titled “Methodology”: (1) Four different similarity based semantic networks were created based on four kinds of similarity measures, including Anderberg, Jaccard, Simple Matching, and Correlation Coefficient. (2) We use domain expertise from a well-known attack taxonomy as semantic rules in our research. This attack taxonomy defined attack categories by consequence at MIT Lincoln Lab [22], as shown in Table 27.4. We next use it to adjust each similarity based semantic network from step 1, and finally, we generate four-rule enforced semantic networks, respectively.

Table 27.4

A Taxonomy That Defines the Attack Category by Consequence

Attack Category Attack Name
Denial of Service(DoS) smurf, neptune, back, teardrop, pod, land
Remote to Local(R2L) warezclient, guess_passwd, warezmaster, imap, ftp_write, multihop, phf, spy
User to Root(U2R) buffer_overflow, rootkit, loadmodule, perl
Probe satan, ipsweep, portsweep, nmap

The Bayesian probability model is also applied to sequences of the network connection event dataset to calculate the probability of occurrence for all attacks. In our experiments, we utilize the Bayesian probability model to identify attacks with a high probability of occurrence for every sequence. After we get these initial attack prediction results, we can locate them as initial nodes in our semantic networks to find other relevant attacks. Figure 27.3 shows the steps of initial attack predicted by the Bayesian probability model in our research process.

image

Figure 27.3 Initial attack predicted by Bayesian probability model in our approach.

Next we give a concrete example of how our semantic network can be used to identify relevant attacks with a high probability of occurrence in the current network situation. We see in Table 27.5, for the ID=73727 sequence of events, a Bayesian prediction model is first used to identify the attack type “Rootkit” as the highest probability of occurrence in the network. We then locate “Rootkit” as an initial node in our rule-enforced semantic network shown in Figure 27.1(b). Our approach automatically indicates other attack types relevant with “Rootkit” and connects them with corresponding relevance scores. The second column, “Actual Label” is used to verify the performance of the semantic network. In this case, the attack “Perl” is predicted by the semantic network with the highest relevance score, and it is just the right attack label for sequence ID=73727. Thus, for this case, the semantic network performs very well to increase the precision in detecting probable attacks with a high probability of occurrence.

Table 27.5

A Concrete Example of Semantic Network

Image

Performance measures

Three metrics—precision, recall, and F-measure—have been applied to evaluate our approach as follows:

image

Experiment results

We evaluate the four different semantic networks based on four various similarity measures, including Anderberg, Jaccard, Simple Matching, and Correlation Coefficient.

One important parameter in our approach is the user-defined threshold t for the relevance score in the semantic network. Only these attacks, which are relevant with a relevance score above t, are included in the semantic network recommendation results. We did experiments by varying threshold t in the range from 0 to 1; the observed values of Precision, Recall, and F-measure are presented in Figures 27.4, 27.5 and 27.6 as follows:

image

Figure 27.4 Average precision graphs for different semantic networks.

image

Figure 27.5 Average recall graphs for different semantic networks.

image

Figure 27.6 F-measure graphs for different semantic networks.

The evaluation results of average precisions for four different semantic networks based on four various similarity measures are shown in Figure 27.4. Among these, an Anderberg-based semantic network performs best in terms of precision when we vary the threshold t determined by the relevance score. The average precision for a Jaccard-based semantic network is in the second place, while the results of other two, Simple Matching and Correlation based semantic networks, are close, but they are lower than the Anderberg and Jaccard results.

At the same time, we drew average recall graphs for these four different semantic networks. We found that their average recall results perform similarly, and they are all in the range of 60 percent to 100 percent when we vary the threshold t.

We also discovered that 0.8 is the optimal threshold value as a relevance score for the best performance of all the semantic networks. We used the F-measure, which is a metric combining precision and recall results, shown in Figure 27.6. Our experiment results shown in the graphs also clearly indicate that the performance of Anderberg is superior to other similarity measures, including the traditional correlation coefficient in the cyber-security domain.

Conclusion and future work

In order to handle the complex and uncertain network attack correlation tasks nowadays, and to increase the precision in detecting probable attacks with probability of occurrence on the current network environment, in this research, we use semantic networks that convey relationships among network attacks and assist in automatically identifying and predicting related attacks. Our contributions in this chapter are an extended work on an abstract paper [23], as following: we used four different similarity measures to automatically create the first mode semantic networks and adjust them with semantic rules provided by domain expertise. A Bayesian probability model is utilized to identify initial attacks located in our semantic network with a high probability of occurrence. We considered four different similarity measures, including Anderberg, Jaccard, Simple Matching, and a traditional correlation coefficient to automatically create semantic networks; and finally, by experiments, we discover that a semantic network using the similarity measure Anderberg performs better in terms of precision and recall compared to the existing correlation approach in the cyber-security domain.

We are confident that our approach can be used to tackle zero-day attacks, which is one of the most essential problems in cyber-security nowadays. A zero-day attack is a computer threat that tries to exploit computer application vulnerabilities that are unknown to the software developers, and its exploits are used or shared by attackers before the developer of the target software knows about the vulnerability. In fact, our approach has used several feature vectors from sequences of network events, and once some zero-day attacks had several feature vectors matching existing types of attacks embedded in them, these attacks should have been automatically constructed as attack nodes in our semantic network based on the attack features’ similarity measures. The next step is how to identify them in our semantic network, and more studies will be addressed on this particular issue in our future works.

One of our recent publications [24] on AMCIS 2012 proposes to utilize domain knowledge in the form of taxonomy and ontology to improve attack correlation in cyber-security. In addition, we expect that the attack correlation results of machine-learning techniques can be used to refine the original attack taxonomy. The findings of the experiments suggest that domain knowledge and machine-learning techniques should be used together on attack classification tasks.

In the future, we plan to investigate additional similarity coefficients that can be used to create semantic networks. Additionally, we would like to apply context filters above the current semantic network technique in order to further increase the precision of attack correlation results. We also plan to experiment and validate our approach with more real network connection datasets.

Acknowledgments

This research is partially supported by Northrop-Grumman Corporation.

References

1. Chien S-H, Chang E-H, Yu C-Y, Ho C-S. Attack subplan-based attack scenario correlation. Proceedings of the Sixth International Conference on Machine Learning and Cybernetics; 2007; Hong Kong.

2. Yan W, Hou E, Ansari N. Extracting attack knowledge using principal-subordinate consequence tagging case grammar and alerts semantic networks LCN’04. Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks 2004.

3. Ning P, Cui Y, Reeves DS, Xu D. Techniques and tools for analyzing intrusion alerts. ACM Transactions on Information and System Security. 2004;7(2):274–318.

4. Manganaris S, Christensen M, Zerkle D, Hermiz K. A data mining analysis of RTID alarms. Comput Netw. 2000;34:571–577.

5. Li W, Tian S. An ontology-based intrusion alerts correlation system. Expert System with Applications. 2010;37:7138–7146.

6. Sowa JF. Semantic networks. [Internet]. Retrieved from: <http://www.jfsowa.com/pubs/semnet.htm>; [accessed 01.13].

7. Sowa JF. Semantic networks. In: Shapiro SC, ed. Encyclopedia of Artificial Intelligence. New York: Wiley; 1992;1493–1511.

8. Masterman M. Semantic message detection for machine translation Using an interlingua. NPL 1961;438–475.

9. Pearl J. Probabilistic reasoning in intelligent systems: Networks of plausible inference San Francisco: Morgan Kaufmann; 1988.

10. Pearl J. Causality: Models, reasoning, and inference Cambridge: Cambridge University Press; 2000.

11. Chen Z, Gangopadhyay A, Karabatis G, McGuire M, Welty C. Semantic integration and knowledge discovery for environmental research. Journal of Database Management. 2007;18(1):43–68.

12. Karabatis G, Chen Z, Janeja VP, et al. Using semantic networks and context in search for relevant software engineering artifacts. Journal on Data Semantics. 2009;XIV:74–104.

13. Darwiche A. Bayesian networks. Commun ACM. 2010 Dec;53:12.

14. Garcia Bringas P. Intensive use of Bayesian belief networks for the unified, flexible and adaptable analysis of misuses and anomalies in network intrusion detection and prevention systems. In: 18th International Conference on Database and Expert Systems Applications; 2007.

15. Kruegel C, Mutz D, Robertson W, Valeur F. In: ACSAC, ed. Bayesian event classification for intrusion detection. 2003.

16. Valdes A, Skinner K. Adaptive, model-based monitoring for cyber attack detection. In: RAID 'OO; 2000.

17. Xie P, Li JH, Ou X, Liu P, Levy R. Using Bayesian networks for cyber security analysis. IEEE/IFIP International Conference on Dependable System & Networks (DSN) 2010.

18. Pensa R, Leschi C, Besson J, Boulicaut J. Assessment of discretization techniques for relevant pattern discovery from gene expression data. BIOKDD 2004: In the 4th Workshop on Data Mining in Bioinformatics; 2004.

19. Lewis DM, Janeja VP. An empirical evaluation of similarity coefficients for binary valued data. International Journal of Data Warehousing and Mining (IJDWM). 2011;7:2.

20. KDD CUP 1999 Intrusion detection dataset. [Internet]. Retrieved from: <http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html>; [accessed 01.13].

21. The DARPA intrusion detection data sets by MIT Lincoln Lab. [Internet]. Retrieved from: <http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/index.html>; [accessed 01.13].

22. Lippmann R, Fried D, Graf I, et al. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. Proceedings of the DARPA Information Survivability Conference and Exposition 1998;12–26.

23. He P, Karabatis G. Using semantic networks to counter cyber threats. ISI 2012: Poster abstract in Proceedings of IEEE International Conference on Intelligence and Security Informatics; 2012 Jun; Washington, DC, USA. p. 184. Link: 10.1109/ISI.2012.6284294.

24. He P, Zhou L, Karabatis G. Using domain knowledge to faciliate cyber security analysis. AMCIS 2012 Proceedings; Paper 19. [Internet]. 2012 Jul 29. Retrieved from: <http://aisel.aisnet.org/amcis2012/proceedings/ISSecurity/19>.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset