Much of the information that follows is based on the concepts presented in the book High Availability Network Fundamentals, by Chris Oggerino (Cisco Press). At the time of this writing, the book is unfortunately out of print. If you can get your hands on a copy, it is worth your while.
This appendix provides richer detail to help you evaluate the components of system availability, as an extension of what was presented in Chapter 6. You can calculate the availability of a single component with the following equation:
So, the availability of a component whose Mean Time Between Failures (MTBF) is 175,000 hours and Mean Time To Repair (MTTR) is 30 minutes would be:
In other words, according to the manufacturer’s testing results, the component is expected to have only 1.51 minutes of downtime per year.
Most systems are composed of more than one component, of course. Multicomponent systems are arranged in a serial or a parallel fashion. For a serial component-based system, each component is a single point of failure, and so each component depends on the other for system availability. In contrast, a parallel component system has redundant components built such that the failure of a single component will not cause the entire system to fail. You can calculate the availability of serial redundant components by multiplying together the availability numbers for each single component:
Here’s how to calculate the availability of a serial multicomponent system, consisting of a processor, bus, and I/O card:
This represents 99.998% availability, which is also called “four 9s and an 8.” That was a simplified example. Now, let’s look at a redundant system availability calculation (see Figure C-1).
Figure C-1 shows a diagram of a simple redundant system with two CPUs, two power supplies, and two I/O cards. You can calculate availability on such a system in the same way you would calculate serial availability. The difference here is that each redundant system is calculated as the difference of 1 minus the product of each redundant and serial component. Note this key qualifier: a single redundant component (i.e., two power supplies) is 1 minus the product of the individual component’s availability. The following formula should help clear this up:
Now that you understand serial versus parallel systems, you can begin to calculate more complex scenarios, such as what’s shown in the following calculation. Assume that you know your I/O card availability is .99995, your CPU availability is .99996, your power supply availability is .99994, and your chassis availability is .999998. The availability calculation would be as follows:
The preceding calculation shows that, based purely on hardware MTBF numbers, this scenario should have only 1.05 minutes of downtime per year; in other words, it is a “five 9s” system.
You can obtain the MTBF component of the equation from your hardware manufacturer, which, if it is a network vendor, most likely uses the Telcordia Parts Count Method, which is described in document TR-332 from http://www.telcordia.com/. Cisco lists MTBF information in its product data sheets, as do Juniper and others.