16.3. Fault and Security Management on High-Speed Networks

16.3.1. Background

In this section, we discuss network-level faults (i.e., anomalies) and intrusion detection, particularly for high-speed networks. It is very important to have integrated fault and intrusion detection because many network faults are often misidentified as intrusions. Such false alerts often make network administrators turn off the IDS systems. Thus, it is of crucial importance to rapidly and accurately identify both faults and intrusions for network-based IDS systems. With the rapid growth of network bandwidth and fast emergence of new attacks/viruses/worms, existing network IDSs are insufficient due to the lack of the following features.

First, separating anomalies from intrusions for false-positive reduction. To detect unknown attacks and polymorphic worms, statistics-based intrusion detection has been widely adopted instead of signature-based intrusion detection. However, many network element faults (e.g., router misconfigurations and polluted DNS entries) can lead to traffic anomalies that may be detected as attacks.

Second, scalability to high-speed networks. Today’s fast propagating viruses/worms (e.g., SQL Slammer worm) can infect most vulnerable machines within ten minutes [12]. Thus, it is crucial to identify such outbreaks in their early phases, which is achievable only in high-speed routers. However, existing schemes are not scalable to the link speeds and number of flows for high-speed networks.

In general, it is difficult for software-based data recording approaches in IDSs to keep up with the link speed in a high-speed router. Thus, the data recording of high-speed IDSs has to be hardware implementable, and it is strongly desirable to achieve the following three capabilities:

  1. Small memory usage.

  2. Sparse memory accesses per packet [13].

  3. Scalability to large key sizes.

Third, attack resiliency. To bypass an IDS, attackers can execute DoS attacks, or fool the IDS to raise many false positives to conceal the real attack. Thus, the attack resiliency of an IDS is very important. However, existing IDSs often keep per-flow states for detection, which is vulnerable.

Fourth, attack root cause analysis for mitigation. Accurate attack mitigation requires IDSs to pinpoint the attack type and flows. This advocates detecting intrusions at the flow level instead of the overall traffic level. Furthermore, we want to differentiate different types of attacks to choose different mitigation schemes accordingly.

Fifth, aggregated detection over multiple vantage points. Most existing network IDSs assume detection to be on a single router or gateway. However, as multihoming, load balancing–based routing, and policy routing become prevalent, even for a connection between a certain source and destination, the packets may traverse different paths [14]. Thus, observation from a single vantage point is often incomplete and affects detection accuracy. Meanwhile, it is very hard to copy all traffic from one router to other routers/IDSs due to the huge data volume.

To meet the above requirements, we propose a new paradigm called DoS resilient High-speed Flow-level INtrusion Detection (HiFIND), leveraging recent work on data streaming computation and, in particular, sketches [15]. Sketches are a kind of compact data streaming data structure that record traffic for given keys and are capable of reporting heavy traffic keys. Essentially, we want to detect as many attacks as possible. As the first step toward this ambitious goal, we aim to detect various port scans (which covers most large-scale worm propagation) and TCP SYN flooding.

While each of these attacks seems relatively easy to be detected, it is indeed very hard to detect a mixture of attacks online at the flow level. To the best of our knowledge, HiFIND is the first DoS resilient high-speed flow-level IDS for port scans and TCP SYN flooding for high-speed networks.

To this end, we leverage and improve sketches to record flow-level traffic as the basis for statistical intrusion detection. First proposed in Schweller et al. [15, 16], sketches have not been applied to building IDSs for the following challenges:

  • Sketches can only record certain aggregated metrics for some given keys. Since it is not feasible to try all possible combinations of the metrics, what would be the minimal set of metrics for monitoring?

  • Existing sketches are all one dimensional. However, various forms of attacks are often hard to identify with such single-dimensional information.

In this section, we address these two challenges and build the HiFIND prototype system to meet the aforementioned five requirements. We make the following contributions:

  • We analyze the attributes in TCP/IP headers and select an optimal small set of metrics for flow-level sketch-based traffic monitoring and intrusion detection. Based on that, we build the HiFIND online high-speed flow-level IDS prototype that is DoS resilient.

  • To analyze the attack root cause for mitigation, we design efficient two-dimensional (2D) sketches to distinguish different types of attacks.

  • We aggregate the compact sketches from multiple vantage points (e.g., routers) to detect intrusion in the face of asymmetric routing and multipath routing caused by per-packet load balancing of routers.

  • For false-positive reduction, we propose several heuristics to separate SYN floodings from network/server congestions and misconfigurations.

As shown in Figure 16.7, HiFIND detection systems can be implemented as black boxes attached to high-speed routers (edge or backbone routers) of ISPs without affecting the normal operation of the routers.

Figure 16.7. Attaching the HiFIND systems to high-speed routers: (a) original configuration, (b) distributed configuration for which each port is monitored separately, and (c) aggregate configuration for which a splitter is used to aggregate the traffic from all the ports.


For evaluation, we tested the router traffic traces collected at Northwestern University (NU) and Lawrence Berkeley National Labs (LBL). We validated TCP SYN flooding and detected port scans and found the HiFIND system is highly accurate. The 2D sketches successfully separate the SYN flooding from port scans and the heuristics effectively reduce false positives related to SYN flooding. Our approach is also very fast in terms of data recording and detection.

16.3.2. Related Work

Intrusion Detection Systems

While some vendors claim to have multigigabit statistical IDSs [17], these claims usually refer to average traffic conditions and use packet sampling [18]. Recent work has proposed detecting large-scale attacks, like DoS attacks, port scans, and so on, based on the statistical traffic patterns. They can roughly be classified into two categories: detecting based on the overall traffic [19, 20] and flow-level detection [21].

With the first approach, even when we can detect the attack, we still do not have any flow or port knowledge for mitigation. Moreover, attacks can be easily buried in the background network traffic. Thus, such detection schemes tend to be inaccurate; for example, Change-Point Monitoring (CPM) [20] will detect port scans as SYN floodings as verified in Section 16.3.4. For the second approach, such schemes usually need to maintain a per-flow table (e.g., a per-source IP table for Threshold Random Walk (TRW) [21]) for detection, which is not scalable and thus provides a vulnerability to DoS attacks with randomly spoofed IP addresses, especially on high-speed networks. TRW was recently improved by limiting its memory consumption with approximate caches (TRW-AC) [22]. However, spoofed DoS attacks will still cause collisions in TRW-AC, and leave the real port scans undetected. As Weaver et al. [22] mentioned, when the connection cache size of 1 million entries reaches about 20% full, each new scan attempt has a 20% chance of not being recorded because it aliases with an already-established connection. Actually, during spoofed DoS attacks, such collisions can become even worse.

The existing schemes can detect specific types of attacks, but will perform poorly when facing a mixture of attacks as in the real world. People may attempt to combine TRW-AC and CPM to detect both scans and SYN flooding attacks. However, each of these two approaches can work properly only when the other one works well.

Table 16.2 shows the high-level functionality comparison of our approach to the other methods. Backscatter detects the spoofed SYN flooding attacks by testing the uniform distribution of destination Internet Providers (IPs) to which the same source (potential victim) sends SYN/ACK [19]. We use this for validating the SYN flooding detected by HiFIND. Venkataraman et al. [23] propose efficient algorithms to detect superspreaders, sources that connect to a large number of distinct destinations. But they may have high false positives with P2P traffic where a single host may connect to many peers for download. Partial Completion Filters (PCF) was recently proposed for scalable network detection [24]. They do not differentiate among various attacks.

Table 16.2. Functionality comparison.
ApproachesSpoofed DoSNonspoofed DoSHScanVScan
HiFINDYesYesYesYes
TRW(AC)NoNoYes(Yes)
CPMYes, but with high FP with port scansNoNo
BackscatterYesNoNoNo
SuperspreaderNoNoYesNo

Sketches for Network Monitoring

There is a significant amount of prior work on efficient and online heavy-hitter detection [13, 25]. However, these approaches are limited in their applicability to online intrusion detection in that (1) they lack the ability to differentiate different types of attacks, (2) they cannot work with time series analysis-based detection algorithms, and (3) they cannot be applied to asymmetric routing environments.

To this end, we designed the original k-ary sketchs [26], and further enhanced them to be reversible sketches [15, 16], which allow us to have separate stages for update, combine, and inference, so that we can easily solve the problems mentioned before. In Table 16.3, we summarize the functions supported by sketches.

Table 16.3. Function of sketches (S–sketch, v–value, y–key, Y –set of keys, t–threshold).
FunctionsDescriptionsk-ary SketchReversible Sketch
UPDATE(S, y, v)Update the corresponding values of the given key into the sketch in the monitoring module
v = ESTIMATE (S, y)Reconstruct the signal series for statistical detection for a given key in the anomaly detection module
S = COMBINECompute the linear combination of multiple sketches
(c1, S1,...,ck, Sk)(ci is coefficient) to aggregate signals in the anomaly detection module  
Y = INFERENCEReturn the keys whose values are larger than the threshold in the 
(S, t)anomaly detection module  

16.3.3. Architecture of the HiFIND System

System Architecture

Figure 16.8 shows the architecture of the HiFIND system. First, we record the network traffic with sketches in each router. Based on linearity of the sketches, we summarize the sketches over multiple routers into an aggregate sketch and apply time series analysis methods for aggregate sketches to obtain the forecast sketches for change detection. The forecast time series analysis method (e.g., EWMA, or exponentially weighted moving average) can help remove noise. By subtracting the forecast sketch from the current one, we obtain the forecast error sketches. Intuitively, a large forecast error implies there is an anomaly, thus, the forecast error is the key metric for detection in our system. Moreover, we aggregate the 2D sketches in the same way and adopt them to further distinguish different types of attacks. We also apply other false-positive reduction techniques as discussed later. Finally, we use the key characteristics of the culprit flows revealed by the reversible sketches to mitigate the attacks. Note that the streaming data recording process needs to be done continuously in real time, while the detection process can be run in the background executing only once every interval (e.g., every second or minute) with more memory (DRAM).

Figure 16.8. HiFIND system architecture.


To deal with asymmetric routing (Figure 16.9), for most existing IDS systems, all the packet traces or all connection states have to be transported from one router to the other. Obviously’ this is very expensive. Moreover, if the link is congested when an attack happens, transmission of these data can be very slow. Furthermore, some routers may use per-packet load balancing so that packets of the same flow may traverse different paths.

Figure 16.9. Sample network topology with asymmetric routing and multipath routing.


In contrast, for HiFIND, we summarize the traffic information with compact sketches at each edge router, and deliver them quickly to some central site. Then, with the linearity of the sketches, we can aggregate them and the resulting sketch has all the information as if all the traffic went through the same router.

Threat Model

Ultimately, we want to detect as many different types of attacks as possible. As a first step, we focus on detecting the two most popular intrusions: TCP SYN flooding (DoS) attack and port Scans/worm propagation, which include horizontal scan (Hscan), vertical scan (Vscan), and block scan [27]. It is also crucial to distinguish them because network administrators need to apply different mitigation schemes for different attacks.

Sketch-Based Detection Algorithm

We denote the key of a sketch as K, the feature value recorded as V, and the reversible sketch as RS(K,V). We also denote the number of SYN packets as #SYN, and the number of SYN/ACK packets as #SYN/ACK.

Here, we only consider the attacks in TCP protocol, that is, the TCP SYN flooding attacks and TCP port scans. Normally, attackers can choose source ports arbitrarily, so Sport is not a good metric for attack detection. For the other three fields, we can consider all the possible combinations of them, but the key (SIP, DIP, Dport) can only help detect nonspoofed SYN flooding, so we do not use it in the detection process. Table 16.4 shows the other combinations and their uniqueness. Here, we define the uniqueness of a key as the capability of differentiating among different types of attacks. For example, the count of unsuccessful connections aggregated by {SIP} can be used to detect nonspoofed SYN flooding attacks (we count it as 0.5), horizontal scans, and vertical scans, so its value of uniqueness is 2.5. The best key would ideally correspond to only one type of attack. Normally a key can be related to several types of attacks, so we need to use more than one dimension to differentiate these attacks as shown in Li and Chen [28]. In this section, we use the tree combinations of two fields as keys for the reversible sketches. Our detection has the following three steps.

Table 16.4. The uniqueness of different types of keys.
KeysSYN FloodingHscanVscanUniqueness
{SIP,Dport}NonspoofedYesNo1.5
{DIP,Dport}YesNoNo1
{SIP,DIP}NonspoofedNoyes1.5
{SIP}NonspoofedYesYes2.5
{DIP}YesNoYes2
{Dport}YesYesNo2

Step 1

We use RS({DIP, Dport}, #SYN-#SYN/ACK) to detect SYN flooding attacks, because it usually targets a certain service as characterized by the Dport on a small set of machine(s). The value of #SYN-#SYN/ACK means that for each incoming SYN packet, we will update the sketch by incrementing one, while for each outgoing SYN/ACK packet, the sketch will be updated by decrementing one. In fact, similar structures can be applied to detect any partial completion attacks [24]. The reversible sketch can further provide the victim IP and port number for mitigation as in the threat model just described. We denote this set of DIPs as FLOODING_DIP_SET.

Step 2

We use RS({SIP, DIP}, #SYN-#SYN/ACK) to detect any intruder trying to attack a particular IP address. The detected attacks can be nonspoofed SYN flooding attacks or vertical scans. For each {SIP, DIP} entry, if DIP ϵ FLOODING_DIP_SET, we put the SIP into the FLOODING_SIP_SET for the next step; otherwise, the {SIP, DIP} is the attacker’s IP and victim IP of a vertical scan.

Step 3

We use RS({SIP, Dport}, #SYN-#SYN/ACK) to detect any source IP which causes a large number of uncompleted connections to a particular destination port. For each {SIP, Dport} entry, if SIP ϵ FLOODING_SIP_SET, it is a nonspoofed SYN flooding, otherwise, it is a horizontal scan.

Here, we apply the EWMA algorithm as the forecast models to do change detection. If we denote M0(t) as the current #SYN-#SYN/ACK at the time interval t, and Mf(t) as the forecasted #SYN-SYN/ACK at the time interval t, we have:

Equation 16.6


The difference between the forecasted value and the actual value, et = M0(t) Mf(t), is then used for detection.

Separating Faults/Anomalies from SYN Flooding

There can be a number of factors other than SYN flooding that may cause a particular destination IP and port with a large number of unacknowledged SYNs. For instance, flash crowds, network/server congestions/failures, and even polluted or outdated DNS entries may cause a large number of SYNs without SYN/ACK at the edge routers. These may cause high false positives in our detection scheme. For the flash crowds, it is difficult, if not impossible, to differentiate them from the SYN flooding attacks without payload information as discussed in Jung et al. [21]. Thus, we aim at reducing the false positives caused by the other two behaviors listed above.

First, we add filters to reduce false positives caused by bursty network/server congestions/failures based on the ratio of #SYN comparing with #SYN/ACK and the fact that attacks may last some time. Second, we add filters to reduce the false positives caused by misconfigurations or related problems based on the fact that DoS attacks attack some active IP addresses and services.

16.3.4. Evaluation

Evaluation Methodology

In this section, we evaluate HiFIND with two data sets. One is the router traffic traces collected at LBL, which consists of about 900 M netflow records. The other is the traffic traces of NU, which has several class B networks, edge routers. The router exports netflow data continuously, which is recorded with sketches of HiFIND on the fly. The one-day experiment in May 2005 consisted of 239 M netflow records, which comes to 1.8 T total traffic.

Unless denoted otherwise, the default time interval for constructing the time series is one minute. The data recording part of the HiFIND system consists of (1) three reversible sketches (RSs), one for {SIP, Dport}, one for {DIP, Dport}, and the other for {SIP, DIP}; (2) one original sketch (OS) for {DIP, Dport}; and (3) two 2D sketches for {SIP, Dport} × {DIP} and {SIP, DIP} × {Dport}. For all the RS and 2D sketches, we update #SYN - #SYN/ACK as the value, and only for the OS, we use #SYN as the value.

The following parameters are chosen based on systematic study as in Schweller et al. [15] and Krishnamurthy et al. [26]. We adopt six stages for each RS and OS and five stages for each 2D sketch in our system. We use 212 buckets for each stage in 48-bit RS, 216 buckets for each stage in the 64-bit RS, and 214 buckets for all their verification sketches. 214 buckets are applied for each stage in OS. We also use 212 × 64 buckets for each stage of the 2D sketches. Therefore, the total memory is 13.2 MB.

Both NU and LBL have a large amount of traffic, so we set the detection threshold to be one unresponded SYN packet per second.

Accuracy of Sketches in Recording Traffic for Detection

Table 16.5 shows the three phases of our detection results. We first detect attacks using reversible sketches with algorithms described previously. The results are shown as “raw results” (phase 1) in Table 16.5. Two-dimensional sketches reduce the false positives for port scans introduced by SYN flooding attacks (phase 2 of Table 16.5). The heuristics previously discussed reduce false positives of SYN flooding attacks (phase 3).

Table 16.5. Detection results under three phases
  Phase 1:FP Reduction
 AttackRawPhase 2:Phase 3:
TracesTypeResultsPort ScanFlooding
 SYN flooding15715732
NUHscan988936936
 Vscan731919
 SYN flooding35350
LBLHscan736699699
 Vscan4011

To evaluate the errors introduced by sketches, we compare the results obtained from the same detection algorithm but with two different types of traffic recording: sketches and accurate flow table to hold per-flow information (we call it the nonsketch method). We find that we detect exactly the same attacks for the two configurations with very different amounts of memory (see memory consumption discussion later. There is no false positive in our results. This shows sketches are highly accurate in recording the traffic for detection.

Comparison of HiFIND and Other Existing Network IDSs
Detection over a single router

We compare the HiFIND with other state-of-the-art work as introduced in Section 16.3.2: The TRW [21] for port scan detection and the CPM [20] for SYN flooding detection.

For TRW experiments, we choose similar parameters as those in Jung et al. [21]. We apply the TRW on both data sets with the same threshold. Repeated alerts are removed from the results of both methods. Table 16.6 shows the comparison results of our methods with TRW for Hscan detection. We observe that the scans detected by these two methods have very good overlap, except for a few special cases. There are a small number of Hscans detected by HiFIND but not TRW, because some attacks have both successful and unsuccessful connection attempts, but TRW cannot detect those suspicious ones in this category. There are also a very small number of Hscans detected by TRW but not HiFIND because they are the combination of multiple small scans, which are too stealthy to be captured by our threshold. It is our future work to further investigate this.

Table 16.6. Horizontal scans detection comparison of HiFIND and TRW aggregated by source IP.
DataTRWHiFINDOverlap Number
NU497512488
LBL695699692

Next, we compare our method with CPM for SYN flooding attack detection. The results are shown in Table 16.7. In the LBL traces, there is no SYN flooding, but a very large number of scans. CPM cannot differentiate them. On the other hand, CPM and HiFIND have very similar results for the NU data because most time intervals contain SYN flooding. Meanwhile, there is a small number of intervals in which SYN flooding is buried in the rest of the normal traffic, so CPM cannot detect them.

Table 16.7. TCP SYN flooding detection comparison of HiFIND and CPM.
DataCPMHiFINDOverlap Number
NU1,4221,4271,422
LBL1,42600

Aggregated detection over multiple routers

In this section, we consider the network topology of Figure 16.9 and evaluate the performance of HiFIND and TRW under such scenarios. To simulate asymmetric routing and multipath routing caused by per-packet load balancing on routers, we split the packet level trace from an NU edge router into three routers randomly, for both inbound and outbound packets. For each packet, we randomly select an edge router to deliver (i.e., for any single connection) the incoming SYN packet and the outgoing SYN/ACK packet, which have two-thirds probability to go through different routers.

For HiFIND, we obtain the same results as those when the traffic goes through the same router. In comparison, we apply TRW to the data on each router for detection and then sum up the result. We found their approach had high false positives or negatives in this case.

Validation of Successfully Detected Intrusions

In this section, we manually examine a certain number of attacks for validation.

SYN flooding

We validate our SYN flooding detection results with backscatter [19]. Among the 32 SYN floodings detected, there are 21 matched with backscatter results. For the other 11 attacks, three are due to threshold boundary effect.

Horizontal scans

We manually validate horizontal scans, in particular, the top five and bottom five attacks in terms of their change difference. Due to limited space, Table 16.8 shows the top five Hscans and Table 16.9 shows the bottom five Hscans from the NU experiment. Detailed evaluation can be found in our technical report [29].

Table 16.8. Five major senarios of the top five Hscans in NU experiment.
Anonymized SIPDport#DIPCause
204.10.110.381,43356,275SQLSnake scan
109.132.101.1992245,014Scan SSH
95.30.62.2023,30625,964MySQL Bot scans
162.39.147.516,10124,741Unknown scan
15.192.50.1534,89923,687Rahack worm

Table 16.9. Five major scenarios of the bottom ten Hscans in NU experiment.
Anonymized SIPDport#DIPCause
98.198.251.16813564Nachi or MSBlast worm
3.66.52.22744564Sasser and Korgo worm
2.0.28.9013964NetBIOS scan
98.198.0.10113564Nachi or MSBlast worm
165.5.42.105,55462Sasser worm

Vertical scans

We also manually validate vertical scans. In the LBL trace, we found one vertical scan. It scanned some well-known service ports, such as HTTPS(81) and HTTP-Proxy(8000,8001,8081). In the NU experiment, we found in total 19 vertical scans. We manually checked all of them and found the vertical scans are mostly interested in the well-known service ports and Trojan/Backdoor ports.

Evaluation Results for Online Performance Constraints
Small memory consumption

In our experiments, we only use a total memory of 13.2 MB for traffic recording. Note that such settings work well for a large range of link speeds.

On the other hand, if hash tables are used to record every flow, much larger memory is required, as shown in Table 16.10. We consider the worst-case traffic of all 40-byte packet streams with 100% utilization of the link capacity. There is a spoofed SYN flooding attack with a different source IP for each packet. For the method without sketch, it needs at least three hash tables corresponding to the three reversible sketches in our detection methods.

Table 16.10. Memory comparison (bytes).
 2.5 Gbps10 Gbps
Methods1 min5 min1 min5 min
HiFIND w/ sketch13.2M13.2M
HiFIND w/ complete info10.3G51.6G41.25G206G
TRW5.63G28G22.5G112.5G

Small memory access per packet

There are 15 memory accesses per packet for 48-bit reversible sketches and 16 per packet for 64-bit reversible sketches [21]. For each 2D sketch, we only need 5 memory accesses per packet, one for each 2D hash matrix. Thus, when recording these sketches in parallel or in pipeline, the HiFIND system has a very small number of memory accesses per packet and is capable of online monitoring.

High-speed traffic monitoring

In the HiFIND system, the speed of 2D sketches is much faster than that of the reversible sketches. Thus, the speed is dominated by the latter. With our prototype single FPGA board implementation, we are able to sustain 16.2 Gbps throughput for recording all 40-byte packet streams (the worst case) with a reversible sketch.

We can also use multiprocessors to simultaneously record multiple sketches in software. We record 239 M items with one reversible sketch in 20.6 seconds (i.e., 11 M insertions/sec). For the worst-case scenario with all 40-byte packets, this translates to around 3.7 Gbps. These results are obtained from code that is not fully optimized and from a machine that is not dedicated to this process.

For the on-site NU experiments, the HiFIND system used 0.34 seconds on average to perform detection for each one-minute interval, and the standard deviation is 0.64 seconds. The maximum detection time (for which the interval contains the largest number of attacks) is 12.91 seconds, which is still far less than one minute. In order to show the scalability of HiFIND, we do some further stress experiments. We compress the NU data by the factor of 60 and detect the top 100 anomalies in each interval. The HiFIND system used 35.61 seconds on average in detection for each interval. The maximum detection time is 46.90 seconds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset