8.6. Case Study: DNS Service

To further illustrate the approach, we model and analyze the trustworthiness (i.e., the security and dependability behavior) of a DNS service. The Domain Name System (DNS) provides a critical service to the Internet—the mapping between names and addresses. The original DNS specifications require that each domain name is served by (at least) two servers, usually referred to as the “primary” and “secondary” server. The primary DNS server holds the “master copy” of the data for a zone. When changes are made to zone data on the primary server, they must be distributed to the secondary server. The secondary server also periodically checks for changes. Contaminated data from the primary can, therefore, be transferred to the secondary server. For the purpose of redundancy, the recommended practice is to configure the primary and secondary DNS servers on separate machines, on separate Internet connections, and in separate geographic locations. Therefore, one can assume that both servers are attacked, will fail, and be repaired independently of each other.

The most important attributes of this service are availability and integrity; the service should be there when the clients need it, and it must provide correct replies to DNS requests. We distinguish between two different types of accidental failures: hardware availability failures, which require a manual repair, and software availability failures, which only require a system reconfiguration and/or reboot. Unfortunately, buffer overflow vulnerabilities are common in multiple implementations of DNS resolver libraries. During its operational lifetime, both servers will be subject to manual maintenance, upgrades, and reconfigurations. Humans frequently make mistakes. Therefore, it is realistic to assume that a server will alternate between a good state (G) where it is secure against these types of attacks and a vulnerable state (V) where buffer overflow attacks are possible. When a server is in the V state, an attacker who can send malicious DNS requests might exploit such a vulnerability to gain access to the server. This will transfer the server into a third state (L), from where it is possible to insert false entries in the server cache—software integrity failure (IS)—or to shut the server down—software availability failure (AS). Both servers are also subject to hardware availability failures (AH). State G,V, and L are considered to be good states. Even though a server is erroneous in state V and L, it still manages to deliver the intended service, that is, to provide clients with correct replies to DNS requests when requested.

8.6.1. Stochastic Model

The service provided by the duplicated DNS service can be described by 36 states: S = {S1,...,S36} ={(si,sj)|si,sj =G,V,L,IS,AS,AH}, where (si,sj) means that the primary server is in state i and the secondary server in state j. As long as the primary server is in state G, V, or L, it will deliver the intended service regardless of the internal state of the secondary. There are four states where the DNS service is unavailable due to accidental software and hardware failures: (AS,AS), (AS,AH), (AH,AS), and (AH,AH). However, in states (IS,G), (IS,V), (IS,L), (IS,IS), (IS,AS), (IS,AH), (AS,IS), and (AH,IS), the service is available, but it provides erroneous results. All these states will, therefore, be included in the set of failed states SF in the upcoming system analysis. Due to space limitations, we display the state transition model for the primary server only (Figure 8.12). The complete state set for the DNS service will be the cross product of this model and a similar model for the secondary server. The transitions labeled with μS and μH rates represent accidental software and hardware failures, the φ rates represent the system administrator’s possible actions, and the λ rates represent the intensities of the possible attack actions. We use the rate values in Sallhammar et al. [17] when determining measures for the DNS service: λV,L = 1/3, λL,ISL,ASIS,AS =3, φG,V = 1/480, φV,G = 1 / 120, φL,G = φIS,G = 1, φAS,G = 3, φAH,G = 1/24, μH = 1 / 3,600, and μS = 1 /120 per hour. The rates are assumed to be the same for both the primary and the secondary servers. We also use φSYN = 2 as the synchronization rate of (possibly contaminated) zone data between the primary and secondary servers.

8.6.2. Stochastic Game

Since we assumed that an attacker will target only one of the servers at a time, there will be a single stochastic game representing the attack scenario for each of the servers. The game elements are the shaded states in Figure 8.12 (i.e., Γ= {ΓVLIS}). The attack action set in the stochastic game is A ={a1, a2, a3,Φ} = {“illegal log-in,” “cache poisoning,” “server shutdown,” “do nothing”} and the defense action set is the corresponding D ={d1, d2, d3,Φ}. Using the rate values in Section 8.6.1 to determine the transition probabilities between the game states (see Section 8.4), the game elements become:

Equation 8.25


Figure 8.12. Internal state transitions for the primary DNS server. The vulnerable system states (i.e., the game elements) are gray. Note that the full expansion to 36 states including the behavior of the secondary server has been suppressed.


similarly for both servers.

8.6.3. Four Scenarios

To determine security and dependability measures for the DNS service, the values for the outcome of the attacks in the various vulnerable states need to be set to obtain the attack probabilities. To illustrate the effect of the reward and cost values on the predicted system measures, we obtain the system measures for four different scenarios.

  • The worst-case scenario. First we look at the “worst-case scenario” when all attackers always try all possible attacks (i.e., πi(ak)=1,∀i,k in Q). In this case, we do not use the game model to determine the expected attacker behavior.

  • Risk-averse attackers. Now assume that the attackers will take into account the possible consequences of their actions. The following reward and cost values are used: ra1Φ = ra2Φ = ra3Φ = 1, ca1d1 = –4, ca2d2 = –3, ca3d3 = – 2, cΦΦ = – 5. Solving the stochastic game in accordance to Algorithm 8.1 provides the optimal attack strategy vectors , , and .

  • Implementing countermeasures. Assume that we want to evaluate the benefit of setting up a new logging and tracing mechanism for the DNS service, with the purpose of reducing the probability of illegal log-in attempts (action a1). As in the previous scenario, we consider risk-averse attackers. All detected illegal log-in attempts will be recorded and prosecuted, which are modeled as an increasing cost value ca1d1 = –7 in game element ΓV. Solving the game once again provides a new expected attack strategy for state V: .

  • Accidental failures only. Finally, assume that we do not consider attacks at all, but rather model accidental failure only (i.e., πi(ak)=0, ∀i,k in Q).

8.6.4. Comparing the Scenarios

The measures for the system dependability and security presented in Section 8.2.4 for the four scenarios are presented in Table 8.1. From the worst case, we see that when potential attacks always are performed, the quantitative trustworthiness is significantly worse than in the other cases (i.e., incurring a risk averseness to the attacker community improves the system performance). The additional countermeasures introduced in the next case yield only a marginal improvement, even though is decreased with 23%. As a conclusion, the DNS service will not benefit much from the new logging and tracing mechanism.

Table 8.1. Main system trustworthiness measures for the scenarios.
 UnavailabilityMTTF [h]MTFF [h]
Worst-case scenario3.4 10–4991996
Risk-averse attackers9.56 10–55,9866,001
With countermeasures9.53 10–56,0186,034
Accidential failures only8.79 10–57,0887,104

As can be seen from the case of accidental failures only, the system’s expected availability time to failures will not increase noticeably when attacks are not included as possible fault sources. Hence, for the actual parametrization of this case study, the random failures dominate the trustworthiness of the DNS service.

It is also of interest to have a closer look at the detailed state probabilities. These are presented in Table 8.2, where probabilities less than 10–8 are shown as 0. An interesting observation from the table is that the first worst case dominates the asymptotic state probability distribution for system states where at least one of the servers has experienced a software integrity failure (i.e., the (IS,–) and (–,IS) states). This is to be expected, since the worst case considers the most aggressive type of attackers. On the contrary, case four with accidental failures only tends to dominate the states where either the primary or the secondary server is vulnerable to buffer overflow attacks (i.e., the (V,–) and (–,V) states). In fact, a substantial fraction of the time is spent in these states yielding a rather low probability for the “everything is OK” state, (G,G). Since this case models accidental failures only, there are no attacks that will bring the system out of this state. This is also the explanation of the counterintuitive result that the worst case has the largest probability of being in the “everything is OK” state, (G,G).

Table 8.2. Asymptotic state probabilities X = {Xi} for the four secnarios, subdivided with respect to the good SG and failed SF states.
Primary, Backup StateWorst CaseRisk AverseWith CountermeasuresAccident Only
G,G0.9678090.9597670.9523820.778212
G,V5.75 10,–39.69 10–31.34 10–29.57 10–2
G,L2.74 10–46.36 10–46.09 10–40
G,IS2.26 10–4 0 00
G,AS3.23 10–33.09 10–33.06 10–32.43 10–3
G,AH6.52 10–36.49 10–36.46 10–35.84 10–3
V,G5.76 10–39.69 10–31.34 10–29.57 10–2
V,V3.42 10–59.79 10–51.88 10–41.18 10–2
V,L1.63 10–66.43 10–68.56 10–60
V,IS1.23 10–6 0 00
V,AS1.9 10–53.12 10–54.31 10–52.98 10–4
V,AH3.88 10–56.55 10–59.08 10–57.18 10–4
L,G2.74 10–46.36 10–46.09 10–40
L,V1.63 10–66.43 10–68.56 10–60
L,L7.74 10–84.22 10–73.9 10–70
L,IS5.82 10–8 0 00
L,AS9.02 10–72.05 10–61.96 10–60
L,AH1.84 10–64.3 10–64.14 10–60
AS,G3.15 10–33.09 10–33.06 10–32.43 10–3
AS,V1.85 10–53.12 10–54.31 10–52.98 10–4
AS,L8.82 10–72.05 10–61.96 10–60
AH,G6.52 10–36.49 10–36.46 10–35.84 10–3
AH,V3.87 10–56.55 10–59.08 10–57.18 10–4
AH,L1.84 10–64.3 10–64.14 10–60
IS,G1.51 10–4 0 00
IS,V8.17 10–7 0 00
IS,L3.88 10–8 0 00
IS,IS3.8 10–5 0 00
IS,AS1.69 10–5 0 00
IS,AH1.38 10–6 0 00
AS,IS1.7 10–5 0 00
AS,AS2.68 10–59.96 10–69.86 10–67.57 10–6
AS,AH2.14 10–52.09 10–52.08 10–51.82 10–5
AH,IS1.38 10–6 0 00
AH,AS2.14 10–52.09 10–52.08 10–51.82 10–5
AH,AH4.39 10–54.39 10–54.39 10–54.39 10–5

Regarding the probabilities of being in the failed states, SF, shown in the lower part of Table 8.2, it is seen, as pointed out above, that except for the worst case these are dominated by the random failures. Hence, although introducing countermeasures in the risk-averse scenario increases the probabilities of staying in the vulnerable states (G,V) and (V,G) notably, this has a small effect on the overall performance of the system as seen from Table 8.1. Studying the figures, it should be kept in mind that numerical values of reward and costs as well as failure rates are chosen for illustration purposes only and the findings may not conform with failure times for a real-life DNS server system.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset