4.4. Transport Network Recovery Techniques

We now discuss some of the more common basic methods for performing traffic recovery after a failure.

4.4.1. Automatic Protection Switching

One particular survivability method that can be thought of strictly as a protection mechanism is 1 + 1 automatic protection switching (1 + 1 APS) [12]. In 1 + 1 APS, a primary end-to-end working channel is duplicated via head-end bridging on a dedicated physically diverse backup channel and both channels are monitored by the receiver. Upon failure of equipment along the path of the primary channel (or simple degradation of its signal), the receiver performs a tail-end transfer to select the backup channel. This is the only widely used survivability mechanism where there is no sharing of spare capacity among multiple working channels, failure scenarios, and so on. Therefore, it generally requires significantly greater amounts of spare-capacity resources relative to other mechanisms where spare-capacity sharing occurs. As a result of the lack of sharing, 1 + 1 APS requires a minimum commitment of at least 100 percent redundancy, and usually considerably higher redundancy. While this may seem to be excessively inefficient, 1 + 1 APS has found widespread use with simpler point-to-point terminals and single-hop channels, as well as in cases where simplicity and restoration speed are crucial and/or when capacity efficiency is not particularly important.

Similar methods that allow sharing of backup channels are 1:1 APS and 1:N APS. The former is nearly identical to 1 + 1 APS, with the key difference being that in 1:1 APS, the backup channel does not carry a live copy of the signal. Rather, in 1:1 APS, the backup channel either remains unused (though regularly tested and exercised) or in some cases allowed to carry a low-priority preemptible and unprotected signal. In either case, the entire backup path is fully preconnected in advance. When an outage (or degradation) of the working signal is detected by the receiver, it notifies the transmitter, which establishes a head-end bridge of the working signal onto the backup, and the receiver then performs a tail-end transfer. This means that 1:1 APS will be somewhat slower than 1 + 1 APS, but it still provides reasonably rapid protection. Depending on the circumstance, the ability to use the backup path to carry low-priority (but still revenue-generating) traffic can more than balance the slower recovery.

1:N APS is a very similar mechanism that is used to provide protection of typically single-hop working channels, where rather than working and backup channels corresponding one-for-one, a single backup channel is used to protect multiple working channels (N of them). Like in 1:1 APS, the receiver detects a failure or signal degradation on one of the working channels, verifies availability of the backup channel, signals the transmitter to establish a head-end bridge of the failed working channel onto the backup channel, and then performs its own tail-end transfer to access the backup channel in place of the working channel in question. 1:N APS is illustrated in Figure 4.5. The multihop transport network equivalent of 1:N APS is called generalized trunk diversity in the public switched telephone network (PSTN) [8, 14] and demand-wise shared protection (DSP) in optical networks [34, 35]. In an even more general case of automatic protection switching, M:N APS, N disjoint working channels with a common pair of end nodes can be protected by M backup channels. Ordinarily, NM, but some high-availability networks might have more than one backup channel per working channel to protect against multiple simultaneous failures [36].

Figure 4.5. Illustration of 1:N automatic protection switching. Used with permission from Doucette [29].


4.4.2. Ring-Based Survivability

Another very simple network survivability mechanism is a survivable ring [7, 13, 37, 38]. While there are several different varieties, in general, a survivable ring is a cyclic structure (i.e., closed loop) of preconfigured transmission systems typically using add-drop multiplexing (ADM) nodal devices that allow signals to be added or dropped from the main line signals within the ring. The transmission capacity between an adjacent pair of ADMs is divided into two halves, one of which is used to carry payload-bearing traffic signals and the other is reserved for use in rerouting a failed signal in the event of failure of some span within the ring. Because of the cyclic nature of the ring, it provides two disjoint routes between any pair of ADMs within it, and so any traffic utilizing the ring will use one route for its primary working channel and a disjoint route as a redundant backup. Since protection is performed locally within the ring and the entire structure is preconfigured (and in the case of spare capacity, ready and waiting on standby), the only postfailure action generally required is for the ADMs on either side of the failure to switch a failed working channel onto its associated backup channel. A ring network consists of a number of individual rings that are interconnected at predefined locations, and a typical end-to-end working signal would transit multiple rings along its route.

There are two primary types of survivable rings. The first type is the unidirectional path-switched ring (UPSR), or the WDM-equivalent optical path protection ring (OPPR) [37]. A UPSR can be thought of as a collection of logical 1 + 1 APS systems adhering to a closed-loop configuration, as shown in Figure 4.6. A UPSR is composed of a system of working channels that transmit working traffic in just a single direction around the ring and a system of protection channels that transmit a backup copy in the opposite direction. Just like in 1 + 1 APS, receivers simply select the backup signal on a protection channel via tail-end transfer when a failure arises. Survival of the backup signal is assured because the working and backup signals do not occupy channels on a common span anywhere along their routes. In terms of spare-capacity requirements, UPSRs are no more efficient than standalone 1 + 1 APS systems, and in general are even worse, since the latter could use any pair of disjoint routes through the network while UPSRs require signals to respect individual ring structures and their interconnections. If we assume that all signals are bidirectional, then the working capacity on any span within a UPSR is the sum of the bandwidths of all signals routed through the ring; the same is true of the spare capacity on any span within the ring. In spite of this, UPSR networks are very common because their closed-form nature, simple implementation and operation, very high speed (in the order of 50 ms [38]), and use of inexpensive nodal devices make them particularly attractive to network operators.

Figure 4.6. Basic operation of a UPSR: (a) prefailure and (b) postfailure. Used with permission from Doucette [29].


The second primary type of survivable ring is the bidirectional line-switched ring (BLSR), or the WDM-equivalent optical shared protection ring (OSPR) [37]. BLSRs are significantly more capacity-efficient than UPSRs, though somewhat more complex in implementation and operation. The source of BLSRs’ improved capacity efficiency is from the loop-back mechanism they utilize, which allows spare capacity to be shared among a number of working signals within the ring. Recall in the UPSR that a bidirectional signal will be routed on working channels on both sides of the same ring (Figure 4.6b) and a corresponding pair of protection channels will be required as well to route the duplicate backup signals. However, in a BLSR, both directions of a bidirectional signal are routed on the same side of the ring and the backup signals are not permanently bridged to the protection channels. When a failure occurs along the working path, the two ADMs adjacent to the failure perform a loop-back operation to switch the failed signals onto the protection channels transmitting in opposite directions, as illustrated in Figure 4.7. Since the working signals are only routed on one side of the BLSR, all of the corresponding working channels (i.e., occupying the same wavelengths if we’re dealing with a WDM network) on the other side of the ring are still available for use by other working signals; similarly with the protection channels. A BLSR, therefore, is capable of carrying a greater amount of traffic than a UPSR with the same capacity. However, a BLSR can still never achieve better than 100 percent redundancy, because protection capacity around the entire ring must at least meet the largest cross-section of working capacity anywhere within the ring. And since signaling and protection switching must be coordinated between the two ADMs adjacent to the failure, BLSR operation is somewhat slower than that in UPSRs.

Figure 4.7. Basic operation of a BLSR: (a) prefailure, and (b) postfailure. Used with permission from Doucette [29].


4.4.3. Mesh Survivability: Span Restoration

The simplest form of mesh survivability is perhaps span restoration, also referred to as link restoration [39]. Span restoration is a localized mechanism where a collectively coordinated set of replacement paths is formed between the end nodes of a failed span, as shown in Figure 4.8, with one replacement path required for each working channel on the failed span. It is often the case that the restoration paths follow relatively few and short localized routes. However, there is generally no restriction on the length or number of distinct routes they can follow, and in extreme cases, each working channel could be rerouted along a different route that could extend to the far reaches of the network. While these restoration paths may have been preplanned prior to failure, spare-capacity seizure and cross-connection occurs strictly postfailure. Spare capacity is shared and available for any number of individual failure scenarios, and in some networks can also be used to carry low-priority, preemptable, and unprotected traffic. Since restoration paths are formed between the end nodes of the failure itself, there is typically no consideration for the ultimate origins and destinations of any constituent working channels on the failed span. As such, it is not uncommon for a repaired signal’s complete end-to-end route to loop back on itself on one or both sides of the failure.

Figure 4.8. An example of span restoration. Used with permission from Doucette [29].


4.4.4. Mesh Survivability: Shared Backup Path Protection

Shared backup path protection (SBPP), also called shared path protection (SPP) or failure-independent path protection (FIPP), is a more recent transport network survivability mechanism that also finds uses in IP/MPLS networks [40,41]. Like in 1 + 1 APS, each working path in an SBPP network has a single preplanned disjoint end-to-end backup path. However, the difference is that spare capacity can be shared among multiple working signals’ backup paths if those working signals are themselves routed over mutually disjoint working routes. As long as that condition is met, no two backup paths will simultaneously require use of the same spare capacity in the event of a single failure. This greatly reduces spare-capacity redundancy as compared to 1 + 1 APS, and in fact, SBPP is one of the most capacity-efficient survivability mechanisms known (in most networks only shared path restoration, described in Section 4.4.5, can be more capacity-efficient).

If we further require that working paths are node disjoint from their backup paths rather than simply span disjoint, and likewise that a set of working paths are mutually node disjoint if their respective backup paths share any spare capacity, then SBPP will also provide protection against node failures. This is in contrast to span restoration that, by its very nature, is incapable of providing node failure protection. We illustrate SBPP in Figure 4.9, where two node pairs each route a working path (solid lines) through the network. Since the two working paths are disjoint, there is no single span failure that will fail both working paths simultaneously, and so their associated backup paths (dotted lines) can safely share capacity on a common span (in this case span X-Y) without risk of spare-capacity contention. In other words, when one of the backup routes is required to restore service on its working path, the other will not also be required.

Figure 4.9. An illustration of shared backup path protection. Used with permission from Doucette [29].


While SBPP does have its advantages as discussed, it also has some disadvantages as well. The primary issue is that each node in the network requires an up-to-date network state database detailing all spare channel-sharing arrangements, topology, and capacity usage information. Every time there is a change in network state (e.g., a new connection has arrived, an existing connection is released), all node databases must be updated. A further disadvantage is that upon failure of just a single span, there is potential for a large number of individual working paths to simultaneously fail as a result. This could require very many nodes to perform cross-connections at the same time, which means the associated signaling for spare-capacity seizure, activation of concatenated backup paths, etc. could become an issue.

4.4.5. Mesh Survivability: Shared Path Restoration

A related mechanism called shared path restoration (SPR) can also be considered to be the end-to-end equivalent of span restoration. SPR is also commonly called failure-dependent path protection (FDPP), or more simply, just path restoration [41]. The primary difference between path restoration and SBPP is that in path restoration, there is no single predetermined restoration route for each working path. Instead, all failed working paths are simultaneously rerouted end-to-end using restoration routes that are collectively optimized in the presence of that specific failure. So any individual working path could make use of a different restoration route for every possible failure that affects it. Furthermore, path restoration makes use of a mechanism called stub release through which the surviving ends or stubs of a failed working path are released and made available as spare capacity. These two features guarantee that path restoration is at least as efficient as SBPP (in fact, an SBPP solution can be exactly duplicated with path restoration by simply foregoing use of stub-released capacity and using a single disjoint backup path for each working path), but in general it is even more capacity efficient than SBPP.

We illustrate path restoration in Figure 4.10, where three node pairs each route their demands through working paths, shown in Figure 4.10a. When a failure occurs, as shown in Figure 4.10b, end-to-end restoration routes are formed for each working path affected by the failure, and because of stub release, restoration routes are allowed to reuse spare capacity on the surviving stub portions of the affected working paths. If a different failure occurs, as shown in Figure 4.10c, a similar end-to-end reroute is performed for the affected working paths, but we can note that working route A-B can now use a different restoration route than the one it used in response to the failure in Figure 4.10b. Obviously, path restoration is operationally more complex than span restoration or SBPP as it requires identification of the specific failure in the working path before restoration can begin. Hence path restoration typically has a slower restoration speed than span or SBPP, but it is more capacity-efficient than either, and like SBPP, path restoration is capable of protecting against node failures.

Figure 4.10. An illustration of path restoration with stub release. Used with permission from Doucette [29].


4.4.6. Mesh Survivability: p-Cycles

One final survivability mechanism we will discuss is p-cycles [36, 42, 43]. p-Cycles are usually considered to be a form of mesh network restoration, though using the definition above, they are more properly classed as a shared preplanned (and preconfigured) protection mechanism. Like rings, p-cycles are cyclic structures of preconfigured capacity that can be used to provide a fast switchover of failed working channels onto standby spare channels. When a failure occurs on a protected span, the p-cycle mechanism inserts the failed working channel onto the corresponding protection channel allowing the signal to bypass the failure around the other surviving portion of the p-cycle. This is illustrated in Figure 4.11. When the p-cycle shown in Figure 4.11a suffers a failure of an on-cycle span in Figure 4.11b, the working channels on the span can be restored the long way around the p-cycle.

Figure 4.11. An illustration of p-cycle restoration: (a) a prefailure p-cycle, (b) protection response for on-cycle failure, and (c) protection response for straddling span failure. Used with permission from Doucette [29].


However, p-cycles have several key differences that set them apart from rings. The principal difference is that in addition to protecting working channels on spans crossed by the p-cycle (so-called on-cyclespans), it is also capable of protecting straddling spans, which are spans whose end nodes are on the cycle but that are not themselves a part of the p-cycle. More importantly, a single unit-size p-cycle can actually protect two working channels on each straddling span; if it is a straddling span that has failed, then the entire p-cycle remains intact, essentially providing two disjoint paths around the p-cycle between the end nodes of the failed span. As shown in Figure 4.11c, if a failure arises on a straddling span, the mechanism can use either direction of the p-cycle to restore a working channel on the failed straddling span. If the failed straddling span has two working channels on it, then both sides of the p-cycle can be used, with each restoring one of the failed working channels. The process is the same no matter which straddling span fails.

Figure 4.11 assumes only a single unit-size p-cycle, but in reality the entire network will be protected by a number of overlapping and/or stacked p-cycles so that there are enough protection paths available to restore all of the failed working channels on any span. For instance, a failed span with 15 working channels might be an on-cycle span for five unit-size copies of one p-cycle and four unit-size copies of another, as well as a straddling span for three unit-size copies of another distinct p-cycle, which would provide protection for six of that span’s working channels. Like span restoration, there is no consideration made for the original end-to-end working path of the failed working channels. Rather, they are simply restored between the end nodes of the failure itself. So it is possible, for instance, that a restored working path may loop back on itself on either side of the failure, though this source of inefficiency is no worse than what would be observed in span restoration itself.

Another important difference relative to rings is that p-cycles are protection structures only, meaning that they are composed purely of spare capacity (recall that in rings, half of the capacity of the ring is for working channels and the other half is for spare channels). The implication is that working paths need not be constrained by the p-cycle systems that will ultimately protect them, as is the case with rings. This allows working paths to be routed via shortest paths or any other route desired through the network. Also, because straddling spans are protected by p-cycles that explicitly do not cross over them, it is not unheard of for some spans in a p-cycle network to carry working channels only. Design and implementation is further simplified as a consequence, since the spare-only nature of p-cycles means that there is no need for any cycle interconnection the way we have in rings, and protection relationships can be considered on an individual span basis.

p-Cycles also provide very fast protection. As with rings, it is only the end nodes of the failed span that need to act in order to provoke a restoration response. In advance of failure, p-cycles are formed in the network by simply forming cross-connections between spare channels on adjacent spans, ensuring that the cross-connections form a closed structure. Then when a failure arises (say on an on-cycle span), the end nodes of the failed span detect the failure and cross-connect each end of a working channel into the spare channels of the p-cycle. There is no need to search for available connections, notify any other nodes, and so on, as the entire response of assignment of working channels to specific p-cycles is preplanned, requiring only a local cross-connection into the protection structure (the response is virtually identical to that of a BLSR). Since all of the other cross-connections are preconfigured, any signal inserted into the p-cycle by one end node will automatically transit the p-cycle until it emerges at the failed span’s other end node, which has cross-connected the working channel’s other end into the same p-cycle. For this reason, and because of p-cycles’ meshlike capacity efficiency, p-cycles are often referred to as combining the speed of rings with the efficiency of mesh [7].

The significant capacity-efficiency advantage p-cycles enjoy over rings is easily demonstrated in Figure 4.12. If we consider a single unit-size copy of the 8-hop ring indicated by the solid dark spans shown in Figure 4.12a, that ring can only protect the 8 spans it crosses. However, a single unit-size copy of a p-cycle that crosses the same spans can protect working channels on 13 spans, as shown in Figure 4.12b. In addition, since five of the spans are straddling spans, the unit-size p-cycle can actually protect two working channels on those spans, meaning the 8-hop p-cycle can protect 18 individual working channels, whereas the ring could only protect eight working channels. If we define a system’s redundancy as the sum of the spare capacity divided by the sum of the working capacity, then the ring in Figure 4.12a will have a minimum of 100 percent redundancy (if all working channels are signal-bearing) while the equivalent p-cycle will have a redundancy of only 8/18 = 44.4 percent. p-Cycle network redundancies as low as 35 percent have been reported in a real network [44], and if Hamiltonians [45] are used, p-cycle network redundancy can theoretically achieve the lower bound on span-restorable redundancy [46]. The drawbacks of p-cycles are that they are not as capacity efficient as path restoration and the recovery path after a failure may be too long as compared to the working path, thereby violating QoS requirements.

Figure 4.12. Spans protected by (a) a unit-capacity ring and (b) an equivalent p-cycle. Used with permission from Doucette [29].


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset