Alert Correlation, Hypothesis, Prediction, and Aggregation

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

10.4. Alert Correlation, Hypothesis, Prediction, and Aggregation

This section discusses the vulnerability-centric approach to the correlation, hypothesis, prediction, and aggregation of intrusion alerts. First, Section 10.4.1 discusses alert correlation in offline applications and its limitations. Then, Section 10.4.2 describes the vulnerability-centric alert correlation method, Section 10.4.3 discusses alert hypothesis and prediction, Section 10.4.4 discusses alert aggregation, and finally, Section 10.4.5 presents experimental results.

10.4.1. Alert Correlation in Offline Applications

In a typical offline alert correlation method, when a new alert arrives, it searches the previously received alerts to find those that prepare for it. This process is repeated for each new alert. This procedure involves two nested loops, and is thus called the nested loop approach. Figure 10.2 illustrates this approach. The left side of the figure shows a sequence of alerts with ascending timestamps (we shall discuss later the case when two alerts have exactly the same timestamps), a₀,a₁,...,a_n. For i = 1,2,...,n, the nested loop approach searches a₀,a₁,...,a_i–₁ for those a_j’s that satisfy Exp(a_j) → Exp (a_i). The search for the alerts that prepare for a_i can be optimized with an index on a₀,a₁,...,a_i–1. After a_i is processed, an entry corresponding to a_i is inserted into the index.

Figure 10.2. The nested loop approach, with or without a sliding window.

As the number of received alerts keeps increasing, any finite amount of memory will eventually be insufficient for storing the index. A sliding window approach comes to the rescue. That is, only the alerts close enough to the new alert are considered for correlation. As in the right side of Figure 10.2 , for the alert a_i the search is only performed on a_i–k,a_i–k₊₁,...,a_i–, where k is a given window size determined by available memory. However, this sliding window approach leads to another problem. The need for performance requires k to be small enough so the index can fit in memory, whereas a smaller k means less alerts will be considered for correlation with the new alert and hence an incomplete result. This situation can be worsened by experienced attackers who are aware of the correlation efforts. They can employ a slow attack to defeat such efforts. Specifically, given an arbitrarily large window size k, for any two attacks that trigger two correlated alerts, the attacker can delay the second attack until at least k other alerts have been raised since the first alert is triggered (based on an estimation of k), so the sliding window will be filled and the two alerts will not be correlated. Instead of passively awaiting, a smarter attacker can actively launch bogus attacks between the two real attack steps to fill in the sliding window in a shorter time. The attacker can even script bogus attack sequences between the real attack steps, such that a deceived correlation engine will be kept busy in producing falsely correlated alerts, while the real intrusion will be advanced in peace of mind.

To remove this limitation of the nested loop approach, we observe that the correlation between alerts does not always need to be explicitly recorded. Note that the correlation between two alerts actually mean two things. First, the prepare-for relationship exists between the exploits to which the two alerts are mapped. Second, the alert preparing for the other must occur before it. Knowing these facts, a new alert only needs to be explicitly correlated with the last alert matching each exploit. Its correlation with other earlier alerts matching the same exploit can be kept implicit through the temporal order and with the matching between alerts and exploits. This is illustrated in Example 10.2.

Example 10.2

In Figure 10.3, suppose the first three alerts a_i,a_j, and a_k, all match the same exploit Exp(a_k) (i.e., their event types match the same vulnerability and they involve the same source and destination hosts). The alert a_h matches another exploit Exp(a_h), and Exp(a_k) prepares for Exp(a_h). Hence, a_i,a_j, and a_k should all be correlated with a_h. However, if the correlation between a_k and a_h has been explicitly recorded (shown as a solid line in Figure 10.3), then the correlation between a_j and a_h can be kept implicit (shown as a dotted line). This observation is important because it can significantly reduce the complexity and memory requirement. Intuitively, for each exploit the correlation algorithm only needs to search backward for the first (a_k in the above case) alert matching that exploit. In the nested loop approach, however, the correlation is always explicit, and hence each new alert must unnecessarily search all the received alerts, as discussed before.

Figure 10.3. Implicit and explicit correlation.

10.4.2. Vulnerability-Centric Alert Correlation

We design an in-memory data structure called a queue graph (the name queue graph comes from the fact that the data structure is a graph and works like a queue). A queue graph is basically an in-memory materialization of the given attack graph with enhanced features (the purpose of the features will be made clear in the following sections). In a queue graph, each exploit is realized as a queue of length one (i.e., a bucket) and each condition as a variable. The realization of edges is based on a bidirectional, layered structure as follows. Starting from each exploit e_i, a breadth-first search (BFS) is performed in the attack graph by following the directed edges. For each edge encountered during the search, a forward pointer is created to connect the corresponding queue and variable. Similarly, another search is performed by following the directed edges in their reversed direction, and a backward pointer is created for each encountered edge. Later, we shall use the backward edges for correlation purposes and use the forward edges for prediction purposes. The pointers are then placed at a separate layer tailored to the queue corresponding to the exploit e_i. The reason for separating pointers into layers is as follows. A BFS always creates a tree (namely, the BFS tree), and hence later, another BFS starting from the same queue can follow only the pointers at that layer. This later BFS will then be performed within a tree instead of a graph, reducing the complexity from quadratic to linear. We illustrate the concepts in Example 10.3.

Example 10.3

In Figure 10.4, from left to right, are a given attack graph: the corresponding queues (shown as buckets), variables (shown as texts), and the (both forward and backward) pointers at different layers. Notice that the layer-one pointers do not include those connecting v₂ and Q₃, because a BFS in the attack graph starting from e₁ will reach c₂ only once (either via e₂ or e₃, but we assume e₂ in this example). The layer-one pointers thus form a tree rooted at Q₁.

Figure 10.4. An example queue graph.

[View full size image]

We have discussed how a nested loop approach correlates alerts. As a comparison, we now perform the same correlation using a queue graph (we discuss other correlation requirements later). Intuitively, we let the stream of alerts flow through the queue graph, and at the same time, we collect correlation results by searching the queue graph. Specifically, each incoming alert is first matched with an exploit and placed in the corresponding queue. Then, because the length of each queue is one, a nonempty queue must dequeue the current alert before it can enqueue a new alert. The results of correlation are collected during this process as a directed graph, namely, the result graph. First, each new alert is recorded as a vertex in the result graph. Second, when a new alert forces an old alert to be dequeued, a directed edge between the two alerts is added into the result graph, which records the temporal order between the two alerts and the fact that they both match the same exploit. Third, after each new alert is enqueued, a search starts from the queue and follows two consecutive backward pointers; for each nonempty queue encountered during the search, a directed edge from the alert in that queue to the new alert is added into the result graph. This is illustrated in Example 10.4.

Example 10.4

Consider correlating the four alerts a_i,a_j,a_k, and a_h in Figure 10.3 with the queue graph given in Figure 10.4, and suppose Exp(a_h) = e₁,Exp(a_k) = e₂, and no other alerts match e₁ or e₂ besides a_i,a_j,a_k, and a_h. First, when a_i arrives, it is placed in the empty queue Q₂. Then, a_j forces a_i to be dequeued from Q₂, and a directed edge (a_i,a_j) in the result graph records the facts that a_i is before a_j and they both match e₂. Similarly, a_k replaces a_j in Q₂, and a directed edge (a_j,a_k) is recorded. Finally, a_h arrives and occupies Q₁, a search starting from Q₁ and following two layer-one backward pointers will find the alert a_k in Q₂. Hence, a directed edge (a_k,a_h) records the only explicit correlation.

The above correlation process is sufficient for demonstrating the advantages of the queue graph approach, although some of the features of the queue graph, such as the variables and forward pointers, are not yet used and will be needed in the next section. First, the time for processing each new alert with the queue graph approach is linear in (m + n), that is, the number of vertices in the attack graph. In the above process, the correlation step visits at most (m + n) edges because it searches in a tree (that is, the BFS tree rooted at Q_i) by following the layered pointers in P_i; the other steps of the procedure take almost constant time. Hence, the performance of the queue graph approach is independent of the number of received alerts, as n and m are relatively stable for a given network. Second, the memory usage of the queue graph approach is roughly O(n(n + m)) due to n layers of maximally (n + m) pointers (the correlation only appends to the result graph but does not read from it, hence, the result graph does not need to reside in memory), which does not depend on the number of received alerts, either. Third, the queue graph approach is not vulnerable to slowed attacks. During the correlation, an alert is no longer considered for correlation only if a new alert matching the same exploit arrives. Hence, if one alert prepares for another, then no matter how many unrelated alerts are injected between them, the first alert will always sit in the queue graph waiting for the second alert.

When an alert is dequeued from the queue graph, it will no longer be needed for correlation. This critically depends on the assumption that alerts arrive in the correct order. However, both the order suggested by timestamps and the actual order of arrivals can be wrong, since the temporal characteristics of alerts are typically imprecise. Instead, we adopt the following conservative approach. First, any two alerts whose timestamps have a difference no greater than a given threshold t_con are treated as concurrent; the correct order of concurrent alerts is always the one that allows the alerts to be correlated. Second, for nonconcurrent alerts, the correct order is the one suggested by their timestamps, but alerts are allowed to arrive in a different (and incorrect) order. This conservative approach enables us to tolerate varying delays in a network and small differences between the clocks of sensors (as discussed earlier, we assume the clocks of sensors are loosely synchronized). However, the basic queue graph approach does not work properly on alerts arriving in incorrect order. Consider an alert a₁ that prepares for another alert a₂ but arrives later then a₂. As described before, the procedure queue graph alert correlation will only look for those alerts that prepare for a₁, but not those that a₁ prepares for (a₂ in this case). Moreover, if another concurrent alert a₃ matches the same exploit as a₂ does and arrives after a₂ but before a₁, then a₂ is already dequeued by the time a₁ arrives, and the correlation between a₁ and a₂ will not be discovered.

To address the above issue, we cache alerts inside a time window before the queue graph and reorder them in a conservative way as follows. Assume the varying delay is bound by a threshold t_max. We postpone the processing of an alert a₁ with a timestamp t₁ until t_max (the larger one between t_max and t_con, when concurrent alerts are also considered) time has passed since the time we receive a₁. We reorder the postponed alerts, so they arrive at the correlation engine in the correct order. Then after t_max time, any alert a₂ will have a timestamp t₂ satisfying t₂ >t₁. The worst case is when a₁ is not delayed but a₂ is delayed t_max time, and the fact that a₂ is received t_max later than a₁ indicates t₂ + t_max – t_max >t₁, and hence t₂ >t₁. Notice here a time window is used only for reordering alerts and no alert will be excluded from correlation due to the use of a time window. This is different from the sliding window used by the nested loop approach, and this time window does not make the correlation vulnerable to slow attacks. However, the capability of dealing with concurrent alerts and varying delays comes at a cost. The additional delay introduced for reordering alerts causes an undesired decrease in the timeliness of alert correlation. Nevertheless, if we choose to report results immediately as each alert arrives, then the imprecise temporal characteristics of alerts may cause incorrect and confusing results that diminish the value of the correlation effort.

10.4.3. Alert Hypothesis and Prediction

Attack graphs provide unique opportunities for hypothesizing alerts missed by IDSs and for predicting possible consequences of current attacks. Intuitively, missing alerts will cause inconsistency between the knowledge encoded in attack graphs and the facts represented by received alerts. By reasoning about such inconsistency, missing alerts can be plausibly hypothesized. On the other hand, by extending the facts in a consistent way with respect to the knowledge, possible consequences of an intrusion can be predicted. To elaborate on those ideas, we first illustrate the concepts Example 10.5.

Example 10.5

The sequence of alerts shown on the left-hand side of Figure 10.5 (that is, a₀, a₃) is inconsistent with respect to the attack graph, because the condition c₃ is not satisfied before the exploit e₃ is executed (as indicated by the alert a₃). On the other hand, the sequence a₀,a₁,a₃ is consistent, because executing the exploit e₁ (as indicated by the alert a₁) satisfies the only condition c₃ that is required by the execution of e₃ (as indicated by a₃). The sequence shown on the right-hand side of Figure 10.5 is inconsistent, because the condition c₄ is not satisfied before the execution of e₃.

Figure 10.5. Consistent and inconsistent alert sequences.

[View full size image]

We have been correlating alerts only to adjacent ones. Such an approach only works for consistent alert sequences. For inconsistent sequences, such as those in Example 10.5, the search will stop at empty queues that correspond to missing alerts and the correlation result will be incomplete. A natural question is, “Can we continue to search and hypothesize missing alerts if necessary?” The question motivates us to extend the correlation method to hypothesize missing alerts. Intuitively, we want to explain the occurrence of a new alert by including it in a consistent sequence of alerts (by alert correlation) and missing alerts (by alert hypothesis). More specifically, a search starts from the queue containing the new alert and hypothesizes about missing alerts for encountered empty queues. It stops at each nonempty queue because it knows that the alert in that queue must have already been processed previously. The search expands its frontier in a breadth-first manner after each hypothesis is made, since the hypothesis itself may also need an explanation. Such attempts continue until satisfactory explanations for the new alert and all the hypothesized ones have been obtained. The explanations of all received alerts collectively form the result graph, which is now composed of alerts, hypothesized alerts, and conditions that are either satisfied or hypothetically satisfied. This is illustrated in Example 10.6.

Example 10.6

Consider again the three cases, from left to right, in Figure 10.5 when the alert a₃ is received. For the first case, two missing alerts matching e₁ and e₂ need to be hypothesized and then a₃ can be correlated to a₀ (through one of the hypothesized alerts). For the second case, no alert needs to be hypothesized because the sequence is already consistent, and a₃ needs to be correlated to a₁. For the third case, a₀ needs to be correlated to a₁, and it also needs to be correlated to a₀ through a hypothesized alert matching e₂.

The correlation process described in the previous section can be modified by replacing the correlation step with a new subprocedure that correlates and hypothesizes alerts as follows. Take a queue graph Q_g with n queues Q and m variables V. Each variable in V can now have one of the three values TRUE, FALSE, and HYP, together with a timestamp, which respectively denote a satisfied condition, an unsatisfied one, a hypothetically satisfied one, and the time of the last update. Each queue in Q can contain alerts or hypothesized alerts. The result graph Gr(V,E_l∪E_r) is similar to that described in the last section. However, the vertex set V now includes not only alerts but also hypothesized alerts and conditions. Now suppose a new alert a_new with timestamp t_new is received and placed in the queue Q_i (1≤i≤n). First, we start from Q_i and follow the pointers in PR_i to set each variable v_j(1≤j≤m) adjacent to Q_i with the value TRUE and the timestamp t_new. This step records the conditions satisfied by a_new. Second, we start from Q_i and make a partial BFS by following the pointers in P_i. The BFS is partial, because it stops upon leaving (given that a BFS is implemented by manipulating a separate queue as usual, we shall refer to the enqueues as reaching and the dequeues as leaving to avoid confusion) a variable with the value TRUE or the value HYP or a queue that contains a hypothesized alert. This step correlates a_new to previously received or hypothesized alerts.

The result graph G_r is updated during the above process as follows. First, after we enqueue a_new into Q_i and make changes to each v_j adjacent to Q_i, we add a_new and v_j (that is, the value and timestamp of v_j) as vertices, and an edge from a_new pointing to v_j into the result graph G_r. This step records the fact that the new alert a_new satisfies its implied conditions at time t_new. Second, during the partial BFS, we record each hypothesis. Whenever we change the value of a variable v_j from FALSE to HYP, we record this update in G_r; similarly, whenever we enqueue a hypothesized alert into an empty queue, we record this hypothesized alert in G_r. Third, whenever we leave a variable v and reach a queue Q, we insert into G_r a directed edge from each queue Q to v; similarly, we insert edges from a queue to its connected variables when we leave the queue. Example 10.7 illustrates the above procedure.

Example 10.7

Consider the left-hand side of Figure 10.5. The first alert a₀ will only cause the condition c₂ to be changed from FALSE to TRUE. The result graph will be updated with the alert a₀, the satisfied condition c₂, and the directed edge connecting them. When a₃ is received, a search starts from (the queue corresponding to) e₃; it changes c₃ from FALSE to HYP; it inserts a hypothesized alert a₁ into e₁ and a₂ into e₂, respectively; and it stops at c₁ (which is initially set as TRUE) and c₂ (which has been set as TRUE when a₀ arrived). The result graph will be updated with the alert a₃, the hypothesized alerts a₁ and a₂, the hypothetically satisfied condition c₃, and the directed edges between them.

Although a BFS takes quadratic time in the number of vertices of a graph, the above process has a linear time complexity. This is because a queue graph organizes its pointers in separate layers, and each layer is a BFS tree rooted at a queue. Hence, a BFS in each layer will be equivalent to a tree traversal, which takes linear time (n + m). This performance gain seems to be obtained at the price of more memory requirement as there are duplicate pointers among layers. However, the memory requirement is quadratic, that is, O(n(n + m)), which is indeed asymptotically the same as that of the original attack graph.

For alert hypothesis, we explain the occurrence of a new alert by searching backwards in the reversed direction of the edges in attack graphs for correlated (or hypothesized) alerts. Conversely, we can also predict possible consequences of each new alert by searching forwards. A BFS is also preferred in this case, because the predicted conditions will be discovered in the order of their (shortest) distances to the new alert. This distance roughly indicates how imminent a predicted attack is, based on the alerts received so far (although not pursued in this chapter, probability-based prediction techniques [22] can be easily incorporated based on the queue graph data structure to more precisely measure how imminent each attack is).

The process for alert prediction is similar to that of correlation and hypothesis except as follows. After the correlation and hypothesis completes for a new alert, the prediction process starts. It begins at the conditions satisfied by the new alert and makes a partial BFS in the queue graph by following the pointers in PR_i (suppose the new alert is enqueued by Q_i). The search stops at previously received (or hypothesized) alerts and their (hypothetically) satisfied conditions to avoid repeating the previous prediction. The result of the prediction process is a sequence of nonempty sets Con₁, Con₂,...,Con_m containing the conditions that can possibly be satisfied in i steps from now. Unlike in correlation and hypothesis, the prediction process does not reason about the disjunctive and conjunctive relationship between exploits. Instead, a condition c will appear in the set Con_i as long as there exists a path of length 2i (the path consists of both security conditions and exploits) from c to some previously satisfied condition. Hence, the number i provides a lower bound to the number of exploits that must be executed before c can be satisfied.

10.4.4. Alert Aggregation

This section studies how to aggregate alerts in the result graph such that no redundant information will be introduced by transitive edges. In previous sections, avoiding unnecessary searches has allowed the time for correlation to be independent of the number of received alerts. As a side effect, this also reduces the size of result graphs by having less transitive edges. However, the queue graph approach does not completely remove transitive edges from result graphs, as illustrated in Example 10.8. In practice, brute-force attempts of the same attack with different parameters can lead to a large number of alerts in a short time (the treasure-hunt data set provided by the University of California-Santa Barbara (UCSB) is a good example for such brute-force attempts). In Example 10.8, if the b_i’s happen to be such an attack, then a large number of transitive edges will make the result graph less perceptible. Therefore, it is desirable to remove such transitive edges. The transitive edges also cause redundant information in alerts. Following the above example, b₁,b₂, and b₃ are indistinguishable in terms of alert correlation. That is, any other alert prepares for (or is prepared for by) either all or none of them. The three alerts can thus be aggregated as a single vertex in the result graph, with the edge connecting these alerts deleted. Similarly, a₂ and a₃ are also indistinguishable. On the other hand, a₁, a₂, and a₃ are not indistinguishable, because c prepares for a₂ and a₃ but not a₁. The right side of Figure 10.6 shows a more compact version of the result graph, with transitive edges deleted and indistinguishable alerts aggregated.

Figure 10.6. An example of compressing result graphs.

Example 10.8

The left side of Figure 10.6 shows the result graph of correlating a series of alerts using the queue graph approach. Transitive edges such as (a₁,b₁) and (a₂, b₁) are not present, since the queue graph approach immediately stops after it reaches a₃. However, the edges (a₃,b₂) and (a₃,b₃) are both transitive edges. When b₂ and b₃ arrive, the queue graph approach repeats the same search as it does for b₁ and thus the two transitive edges are inserted into the result graph. Similarly, the edge (c,a₃) is also transitive.

Existing alert correlation approaches usually take extra efforts in making the result graph more compact, such as aggregating alerts before correlating them [20]. The additional step increases the performance overhead of alert correlation. We show that our queue graph approach can be modified to directly produce a compact result graph. We also show that the modified queue graph approach may actually be more efficient. We first modify the queue graph approach to avoid inserting transitive edges into the result graph. For this purpose, we let each backward pointer in a queue graph have one of the two states: on and off. Initially, all the backward pointers are on. The backward pointers are then switched between the two states as follows. Whenever a directed edge (a_i,a_j) is inserted into the result graph, we turn off the backward edges between the corresponding queues Q_i and Q_j. Whenever an alert is enqueued in a queue Q_i, all the backward pointers arriving at Q_i will be turned on. Finally, when we search for older alerts that prepare for a new alert, we follow a backward edge only if it is currently turned on. This process is illustrated in Example 10.9.

Example 10.9

In the left side of Figure 10.6, suppose the alerts a_i,b_i,c correspond to the queues Q_a,Q_b, and Q_c, respectively. When the alert b₁ arrives, it searches through the backward pointers from Q_b to Q_a and inserts an edge (a₃, b₁) into E_r. Then, according to the above discussion, the backward pointers from Q_b to Q_a will be turned off. Consequently, the alerts b₂ and b₃ will not follow those pointers, and the transitive edges (a₃,b₂) and (a₃,b₃) are avoided. This remains true until the alert a₄ arrives, which turns on all the backward pointers arriving at the queue Q_a. Then later when b₄ arrives, it follows the backward pointers from Q_b to Q_a and inserts the edge(a₄,b₄).

Alerts are aggregated during the above process as follows. Suppose an alert a_i arrives and the corresponding queue Q_i already contains another alert a_ii. Then a_i is aggregated with a_ii if the following two conditions are true. First, all the backward pointers arriving at Q_i are on. Second, all the backward pointers leaving Q_i are off. The first condition ensures that a_ii does not prepare for any other alerts that arrive between a_ii and a_i, because otherwise a_i and a_ii would not be indistinguishable. The second condition ensures that a_ii and a_i are prepared for by the same collection of alerts, so they are indistinguishable with respect to those alerts. This process is illustrated in Example 10.10. This new procedure not only produces a more compact result graph, but is also more efficient than the original one in most of the cases. This is because unnecessary searches corresponding to transitive edges are avoided. In Figure 10.6, the alerts a₃,b₂, and b₃ will not lead to a search in the modified approach because the backward pointers have been turned off by earlier alerts. The performance gain can be significant in the case of brute-force attempts where a large number of searches can be avoided.

Example 10.10

Following the above example, a₃ is aggregated with a₂ because the backward pointers from Q_b to Q_a are on and those from Q_a to Q_c have been turned off by the alert a₂. Similarly, b₂ and b₃ are aggregated with b₁, because the backward pointers from Q_b to Q_a have been turned off by b₁. On the other hand, the alert b₄ will not be aggregated, because the backward pointers from Q_b to Q_a must have been turned on by a₄ by the time b₄ arrives.

10.4.5. Empirical Results

The correlation engine is implemented in C++ and tested on a Pentium III 860 MHz server with 1 G RAM running RedHat Linux. We use Snort-2.3.0 [1] to generate isolated alerts, which are directly pipelined into the correlation engine for analyses. We use Tcpreplay 2.3.2 [37] to replay network traffic from a separate machine to the server running the correlation engine. Two data sets are used for experiments: the DARPA 2000 intrusion detection LLDOS 1.0 by MIT Lincoln Labs [38], and the treasure-hunt data set by the UCSB [39]. The attack scenario in the DARPA 2000 data set has been extensively explored before, such as in Ning and Xu [40]. Our experiments with the data set show similar results, validating the correctness of our correlation algorithm. The treasure-hunt data set generates a large amount of alerts (about two million alerts taking about 1.4 G of disc space, with most of them being brute-force attempts of the same attacks), which may render a nested loop–based correlation method infeasible (we found that even running a simple database query over the data will paralyze the system). In contrast, our correlation engine processes alerts with negligible delays (Snort turns out to be the bottleneck).

To evaluate the performance of the correlation engine, we use the performance metric of resource usage (computer processing unit (CPU) and memory) and the processing time of each alert. The correlation engine measures its own processing time and compares the processing time to the delay between receiving two consecutive alerts from Snort. All the results have 95% confidence intervals within about 5% of the reported values. Figure 10.7 shows the CPU usage (on the left-hand side) and memory usage (on the right-hand side) over time for the DARPA data set. The correlation engine clearly demands less resources than Snort (on average, the correlation engine’s CPU usage and memory usage are both under 10% of Snort’s).

Figure 10.7. The CPU and memory usage.

[View full size image]

The left chart in Figure 10.8 shows the processing time per alert (averaged per 22 alerts). Clearly, the correlation engine works faster than Snort in processing the entire data set. The result also proves that the performance does not decrease over time. Indeed, the processing time per alert remains fairly steady. We examine the scalability of the correlation engine in terms of the number of exploits and conditions. The treasure-hunt data set is used for this purpose. The original attack graph only has about 100 exploits. We increase the size of attack graphs by randomly inserting dummy exploits and conditions. The inserted exploits increase the complexity of correlation because the correlation engine must search through them. The right chart in Figure 10.8 shows that the average processing time scales with the size of attack graphs as expected.

Figure 10.8. The processing time and its relationship with the size of the attack graph.

[View full size image]

We replay network traffic at relatively high speed (e.g., the DARPA data set is replayed in about 26 seconds while the actual duration of the data set is several hours). Real-world attacks are usually less intensive, and consequently our correlation engine will exhibit a better performance. However, we are aware that real-world traffic may bring up new challenges that are absent in synthesized data sets. For example, we currently set the time window used to reorder alerts (i.e., t_max) as discussed before as one second to deal with identical timestamps of alerts. In a real network, the windows size must be decided based on the actual placement of IDS sensors and the typical network delays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Alert Correlation, Hypothesis, Prediction, and Aggregation

Create new playlist

Sign In

Sign Up

10.4. Alert Correlation, Hypothesis, Prediction, and Aggregation

10.4.1. Alert Correlation in Offline Applications

Figure 10.2. The nested loop approach, with or without a sliding window.

Example 10.2

Figure 10.3. Implicit and explicit correlation.

10.4.2. Vulnerability-Centric Alert Correlation

Example 10.3

Figure 10.4. An example queue graph.

Example 10.4

10.4.3. Alert Hypothesis and Prediction

Example 10.5

Figure 10.5. Consistent and inconsistent alert sequences.

Example 10.6

Example 10.7

10.4.4. Alert Aggregation

Figure 10.6. An example of compressing result graphs.

Example 10.8

Example 10.9

Example 10.10

10.4.5. Empirical Results

Figure 10.7. The CPU and memory usage.

Figure 10.8. The processing time and its relationship with the size of the attack graph.

Table of Contents for
Alert Correlation, Hypothesis, Prediction, and Aggregation