4.1. Introduction

Due to the widespread use of communication networks, society is increasingly dependent on the exchange of information for basic societal functions. Several widely publicized network outages illustrated that disruption of communications services can be very expensive to businesses and critical services, for example, loss of emergency services, air traffic control systems, and financial services. In fact, existing communication networks, such as the Internet, circuit-switched telephone networks, and cellular networks, are considered part of the critical national infrastructure (CNI) of various nations [13]. The United States President’s Commission on Critical Infrastructure Protection (PCCIP) noted that “our security, economy, way of life, and perhaps even survival, are now dependent on the interrelated trio of electrical energy, communications, and computers” [4]. Furthermore, traditional critical infrastructures like health, banking, transportation, defence and public administration all heavily depend upon data communication networks through their dependence of supervisory control and data acquisition (SCADA) systems causing the network infrastructure to be considered one of the key critical infrastructures [1, 2].

A network failure is typically defined as a situation when the network fails to deliver the committed quality of service (QoS). A network failure can be a degradation of service or service disruptions ranging in length from seconds to weeks. It can occur due to a variety of reasons: Typical events are cable cuts, hardware malfunctions, software errors, natural disasters; earthquakes, floods, hurricanes for example, human errors; typically incorrect maintenance, and malicious physical and electronic attacks [5, 6]. The growing vulnerability of the public-switched network has been addressed by the U.S. National Research Council, which noted, “As we become more dependent on networks, the consequences of network failure become greater and the need to reduce network vulnerabilities increases commensurately.” Thus, it is imperative to minimize service degradation, disruption and destruction and communication networks need to be designed to adequately respond to failures. This has led to an increasing interest in design of survivable networks.

A number of definitions of network survivability have appeared in the literature [720], including “the capability of a network where a certain percentage of the traffic can still be carried immediately after a failure” [8]; “the ability to provide service continuity upon network failure” [9]; or “the set of capabilities that allows a network to restore affected traffic in the event of a failure” [10]. In effect the survivability of a network is its ability to support the committed QoS continuously in the presence of various failure scenarios. Techniques to improve a network’s survivability can be classified into three categories:

  1. Prevention.

  2. Network design.

  3. Traffic management.

Prevention or avoidance techniques focus primarily on improving component and system reliability and security in order to reduce the occurrence of faults. This contributes to the survivability of the network by making the number of surviving elements larger, thereby increasing the ability of the techniques in category two and three to handle failures that do occur. This is particularly so because at present the category two and three techniques mainly handles single network failures while multiple failures are usually not guaranteed to be survivable. Perhaps the most obvious prevention techniques are physical security measures, such as housing equipment in highly secure and sound structures, providing backup power systems, implementing “call-before-you-dig” practices, and so on. Prevention also includes electronic security techniques to protect network resources like routers and user data from unauthorized access, as well as improving software reliability.

Survivable network design techniques try to mitigate the effects of system-level failures, such as link or node failures, by placing sufficient diversity and capacity in the network topology. A typical example is multihoming nodes by making each node two or more connected so that a single link failure cannot isolate a node from the remainder of the network. The basic survivable network design technique is to add redundancy to the network, with the critical issues being where and how much redundancy to add.

Traffic management procedures seek to quickly detect a failure or degraded QoS and direct the network load such that the committed QoS is maintained, thus making the network inherently fault-tolerant and self-healing [7]. The traffic management procedures essentially seek to take advantage of preplanned redundancy and any available capacity in the network after a failure. A typical example is the use of preconfigured backup label–switched paths in multiprotocol label switched (MPLS) networks.

Thus, the combined goal of the three categories of survivability techniques is to make a network failure imperceptible to network users by providing the desired QoS. However, cost and complexity are always an issue. The key challenge is to provide the minimum required QoS at a minimum cost and in the simplest fashion. This chapter provides background information and surveys the existing literature in network survivability. The focus is on survivable network design and traffic management procedures. Section 4.2 briefly discusses prevention techniques. In Section 4.3, basic survivable network design and traffic restoration concepts are surveyed. Section 4.4 presents typical survivable network design models and mechanisms. Section 4.5 discusses the survivable network design techniques, and includes an example ILP formulation. Section 4.6 discusses the survivability challenges presented by multilayer networks. Lastly, Section 4.7 concludes the chapter and presents areas for future network survivability research.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset