This chapter describes the architectures of the Fore Systems PowerHub multilayer switches [FORESYSWP96]. The PowerHub multilayer switches (PowerHub 7000 and 6000) perform both Layer 2 and Layer 3 forwarding and support a wide range of physical media types for network connectivity. The PowerHub supports Layer 3-based VLANs, IP routing (RIP and OSPF), and IP multicast among other features.
The PowerHub employs a software-based packet processing engine (with multiple RISC processors) for both route processing and packet forwarding. The main argument for this software-based solution at the time the architecture was proposed is that it is very flexible to allow routing and management protocols, packet forwarding features to be added or updated, and software bugs to be fixed with simple software fixes, upgrades, or download.
It was viewed at that time that network devices (switches, routers, switch/routers, etc.) that use ASICs to process and forward packets are fixed-function devices that do not have the kind of flexibility required when the system requires enhancements or modifications. As an example, when the designer requires the switch to be enhanced with IEEE 802.1Q VLAN tagging for Ethernet frames, most ASIC-based switches will require parts of the switch to be replaced with new ASICs incorporating this new tagging feature. However, the PowerHub, being software-based, allows the VLAN tagging feature to be added to its software base with no hardware swaps required.
As observed in [FORESYSWP96], the PowerHub routing and forwarding software was designed as a fully optimized software platform to take advantage of the semiconductor device and hardware speed improvements available at that time. A design goal was to allow this innovative software architecture to effectively use capabilities such as shared memory, cache organization, write buffers, and burst-mode CPU transactions.
Although some of the components and features of the PowerHub are obsolete and will not be used in today's architectures and networking environments, they are discussed in this book to allow the reader to appreciate how multilayer switches have evolved over the years.
Adopting the architecture categories broadly used to classify the various designs in Chapter 3, the following architectures are covered in this chapter:
The PowerHub has an architecture that supports a shared memory (i.e., the packet buffer memory), program memory, flash memory, multiple RISC processors, network interface controllers, packet switching engine, and an ASIC for optimizing shared memory access. All packets in transit through the switch are first received into the shared memory, processed by the CPUs (the RISC processors) and then forwarded to the appropriate destination port(s) for transmission to the network. The shared memory architecture was designed with the goal of simplifying the packet processing and forwarding algorithm. Figures 6.3–6.7 show the main architectural features of the PowerHub switches.
To improve the packet forwarding performance and scalability of networks that deploy the PowerHubs, the multiple RISC processors are distributed throughout the system and, in addition, the switch uses a combination of centralized and distributed processing. The Fast (100 Mb/s) Ethernet interface modules, Fiber Distributed Data Interface (FDDI) modules, and Asynchronous Transfer Mode (ATM) modules all implement locally their own Layer 2 and Layer 3 forwarding functions, thus increasing both the packet forwarding performance and scalability of the switch.
Packet forwarding decisions in the PowerHub can be performed at the Layer 3 (IP) or at Layer 2. The PowerHub can also process and forward packets based on user-defined filters at either Layer 2 or Layer 3. In addition, the PowerHub supports network management features such as statistics gathering, security filtering, and port monitoring (also known as port mirroring).
Port monitoring can be used to capture/copy network traffic passing through a PowerHub port for analysis. This feature allows packets passing through a port on the switch to be copied to another port on the switch that has a connection to a traffic probe device, a Remote Monitoring (RMON) probe, or a security device. Essentially, port monitoring allows packets in the transmit, receive, or both directions on one source port (or more) to be copied or mirrored to a destination port for analysis.
In the PowerHub 7000, the Network Interface Modules (NIM) are connected to the packet engine through a set of buses as shown in Figure 6.3. The PowerHub 7000 architecture also supports the following:
The PowerHub 7000 supports a shared memory, RISC processor-controlled route processing, and packet forwarding component. All packets arriving to the system are stored in a centralized shared memory. The relevant packet fields are inspected and processed by RISC processors to make the forwarding decisions required. After determining the forwarding information, the packets are then sent to the appropriate destination port or ports (for multicast traffic) for transmission to the network.
To achieve scalable performance, the PowerHub distributes the processors and shared memory to other system modules that include the INIMs, which consist of 100 Mb/s Ethernet, FDDI, and ATM modules. Distributing the processing and shared memory to the INIMs allows processing and forwarding of packets locally within the INIMs.
The main components of this multilayer switch are described below (see Figures 6.4–6.6). At the time the PowerHub was designed, Fast (i.e., 100 Mb/s) Ethernet and FDDI were considered high-speed networking technologies with FDDI mostly used as the preferred transport technology for network backbones.
The packet engine module holds the “intelligence” of the PowerHub and consists of a number of processors and a shared memory, other components that perform high-speed Layer 2 and Layer 3 forwarding, and network management functions.
All packets arriving to the PowerHub are received by all the NIMs and stored into the shared memory except for the INIMs (i.e., ATM, FDDI, and 6×1 Fast Ethernet modules). The received packets are inspected by the processors and marked accordingly for Layer 2 or 3 processing and then forwarding as required. Processed Layer 2 or 3 packets are read directly by the NIMs from the shared memory and transmitted to the destination ports as specified by their forwarding instructions.
The shared memory is designed from standard cached SRAM chipsets. Custom-designed bus/memory interface ASICs are used to provide the multiple ports required for the processors and packet channels to access the shared memory. While other LAN switches at the time the PowerHub was designed may use a shared bus arbitration approach, the PowerHub employs a pipelined, shared memory access approach. This pipelined approach to the shared memory allows each shared memory port to operate as if it has exclusive access to the shared memory.
The RS-232 standard was renamed as EIA RS-232, then EIA 232, and is now referred to as TIA 232. The RS-232 ports also allow a network administrator to directly access all the PowerHub system functions through a command-line interface (CLI). These ports also allow the network administrator to perform in-band network management via Telnet and SNMP.
In addition, the packet engine supports alarms and temperature sensors that can be read by the system management software to determine if the temperature of the packet engine is within acceptable limits. These readings from the temperature sensors can trigger alarms that can be at the board or system level.
When a packet accelerator is used in PowerHub 7000, its packet processing and forwarding capacity increases by approximately 50%. By using the packet accelerator, both the processing power and the amount of shared memory are increased.
The PowerHub 7000 supports several types of NIMs for network connectivity:
The INIMs and the packet engine are similar in architecture and share similar features. INIMs have two RISC CPUs, a shared memory for storing, locally, packets, and a local program and data memory. The local forwarding tables used by the INIMs for forwarding decisions are generated by the packet engine that runs the routing protocols.
This distributed packet forwarding and shared memory architecture implements a centralized route processing (or control) engine in the packet engine coupled with distributed packet forwarding in the INIMs. In this architecture, the INIM has a copy of the main Layer 2 and 3 forwarding tables that are maintained by the packet engine. All updates to the packet engine's Layer 2 and 3 forwarding tables are also transferred to the INIMs local Layer 2 and 3 forwarding tables.
Packets received by an INIM are stored locally in its shared memory and a forwarding decision is performed locally by the INIM's CPU using its local copy of the forwarding tables. This allows the PowerHub to distribute the forwarding intelligence required for packet forwarding to the INIMs, thereby significantly increasing the packet forwarding capacity of the whole system. The local forwarding tables in the INIMs are always kept synchronized to the main forwarding tables in the packet engine.
Compared to the PowerHub 7000, the PowerHub 6000 is a smaller, compact design with a packet engine and two optional NIMs in a three-slot chassis (Figure 6.7). This design supports 12 10BASE-T Ethernet ports and optional ports for Fast Ethernet, FDDI, and ATM connectivity. PowerHub 6000 also supports a main power supply plus an optional, secondary, or redundant power supply. It also supports the full range of routing, management, and system software features found on the PowerHub 7000.
The PowerHub 7000 and the PowerHub 6000 are architecturally similar but the latter differs by supporting the following features:
In the PowerHub 6000, unlike in the PowerHub 7000, packet filtering and forwarding for traffic arriving at FDDI NIMs are performed by the packet engine's CPUs in the PowerHub 6000's, and not by local CPUs on the INIMs (as in PowerHub 7000). The operation of PowerHub 6000, other than those cited above, are almost identical to that of PowerHub 7000 [FORESYSWP96].
The PowerHub supports four CPUs (two on the packet engine and two on the packet accelerator (Figure 6.4)) with its software designed to run in a multiprocessor system. The design goal is to allow the system to run with as little overhead as possible. The sources of overhead in a multiprocessor system, traditionally, are context switching, interrupt processing, and locking/unlocking data structures.
Traditionally, interrupts and context switching following interrupts are often employed in software-based switching and routing systems developed to handle unpredictable network traffic loads. For instance, when a packet arrives at a network interface, a hardware interrupt is asserted to alert the appropriate software processing module to handle the packet. In the meantime, the system does not use any processing time to look for nonevents in the systems. Such designs are most desirable or suitable when the packet arrival rate to the system is low.
An alternative to interrupt-driven software designs is the polling software architecture used in the PowerHub [FORESYSWP96]. Here, when the PowerHub has detected that a port is active (i.e., “Link detected” on a port), that port is added as part of the packet polling loop maintained by the system. When a port is polled, its packets are transferred over one of the two 800 Mb/s packet channels and stored in the packet engine's shared memory for packet processing.
The packet processing tasks (for Ethernet frame forwarding example), in general, include the following for both centralized packet forwarding in the packet engine and distributed packet forwarding in the INIMs [FORESYSWP96]:
A key component on which the PowerHub polling architecture depends to carry out its functions is the PowerHub network interface (or NIM) controllers. In the traditional design, a CPU steps in immediately as soon as the interface controller indicates that a packet has been received or is to be transmitted. In the polling architecture, the interface controller does not require the CPU to respond immediately when a packet has arrived or requires transmission.
The PowerHub polling uses a memory-based descriptor structure that allows the interface controller to store the status of a received packet or a packet requiring transmission. This allows the interface controller to move on to the next packet without CPU intervention. This architecture allows up to 512 packets to be received or queued for transmission back-to-back.
This means that when many packets are queued for transmission due to heavy load on a network from a PowerHub switch 10BASE-T port, for example, the interface controller can transmit them out the switch port back-to-back while complying only with the minimum 9.6 μs interpacket gap required by the 10 Mb/s IEEE 802.3 Standards.
Another feature of the PowerHub polling architecture is that it allows each software process to run to completion before the next process kicks in. The goal is to avoid the situation where the software and hardware architectures have to handle the extremely complex scenarios where two separate processes can inconsistently modify/manipulate the same piece of data at the same time.
Other switch architectures employ hardware or software locks to avoid the above problem (i.e., semaphores, locked bus cycles, or test-and-set instructions). Processing each lock can add significant processing overhead latency; for example, it can take as long as 3 μs to process a lock. In a system that is less optimized, it can take the processing of several locks per packet, further increasing the processing overhead latency per packet.
So to avoid these limitations in the traditional designs, the PowerHub architecture employs the polling architecture where the interface controller uses descriptor lists and the burst mode communication method as described above [FORESYSWP96]. In reality, these memory-based descriptor lists are realized as simple FIFO queues where the interface controller adds packet status items at one end and a processor removes these items at the other end.
In the PowerHub, six separate processes service each packet forwarded by the system, with the processes communicating through FIFO descriptor queues. These FIFO descriptor queues are implemented in the shared memory as data structures that are accessed whenever necessary by PowerHub hardware processing elements. As many as 22 RISC processors can be supported within a PowerHub chassis. The PowerHub has additional processing capabilities in the network interface (NIM) controllers.
These NIM controller processing elements are implemented as microprogrammed engines that process FIFO descriptor queues and interact with the PowerHub packet engine's RISC processors. Figure 6.8 shows how the I/O Processor and Main CPU share processes when the system is using two of the Packet Engine's CPUs. Figure 6.9 illustrates how processes are shared among four CPUs when the Packet Accelerator is installed on the Packet Engine.
In addition to the hardware processes on the interface (NIM) controllers, the following six processes also service each packet forwarded through the system:
Special packets such as SNMP requests, RIP updates, and ICMP Pings may be directed to the PowerHub's packet engine (which holds also the route processor) where the Main CPU processes them. Once the forwarding decision is made, the receive FIFO descriptor for the packet is then updated with the destination port or ports (for multicast traffic). The destination is specified in the descriptor using a port mask, which is a bit map that specifies the destination port or ports to which the packet should be forwarded.
If the transmit FIFO descriptor queue is not empty, a transmit-demand command is issued to the corresponding destination ports' network interface controllers. After forwarding a packet outside the PowerHub, the I/O Processor moves the pointer to the freed packet buffer from the “free” list to the receive FIFO descriptor queue.
Once a packet has been transmitted, the interface controller marks in the corresponding transmit FIFO descriptor (of the packet) a “completion status” indicator and error information related to the packet. The interface controller then immediately polls again the transmit FIFO descriptor queue for more work.
Each of the above software processes on the forwarding path is designed and optimized to continuously run in their various processors in a tight loop. The most computationally intense process is the Forward Polling (Step 3 above). This makes the performance of the Main CPU (where the Forward Polling is performed) the main limiting factor for the overall PowerHub packet forwarding capacity.
The remaining three software processes (Receive Polling, Transmit Polling, and Transmit Cleanup) are designed to be simple enough for the I/O Processor to multitask among them and still support the Main CPU's achievable packet forwarding rate.
Various switch and router architectures have emerged over the years of internetworking since 1990s. Each architecture has made its mark and contributed in various ways to the performance improvements seen in today's network devices and networks. A majority of the first-generation switches and routers were often designed based on shared bus and shared memory architectures with a single, centralized processor. However, as network sizes and traffic loads grew and the information in forwarding tables continued to increase, packet forwarding rates in switching and routing devices were also required to increase.
Switch and router designs continued to evolve in architecture and in how the system divides up processor cycles among the various tasks running in the system. The architectures had to evolve to accommodate the added processing burden imposed by the growing network sizes and traffic loads. Shared-memory-based switch and router architectures had to be enhanced or redesigned all together, and newer and improved designs based on crossbar switch fabrics emerged.
These improvements delivered higher system reliability and packet forwarding capacities and made system performance (e.g., speed, latency, and data loss) much more predictable when operating under the larger and more dynamic traffic loads generated by newer end-user applications.
In addition to the use of higher capacity switch fabrics, one approach that has made a lasting impact on switching and routing system design and has significantly improved system performance is to logically decouple the packet forwarding functions (i.e., data or forwarding plane) from the control functions (i.e., control plane). The control functions run on the switching and routing system's centralized CPU or in CPU's located on distributed forwarding line cards in some switching and routing architectures.
This design approach was that the two planes are separated, allowing the two set of functions to run without interrupting or impeding each another. This allows the forwarding functions to reach their target forwarding speeds and avoid unpredictable latencies and lost data during packet forwarding. The control functions can also run in their isolated plane without impeding the forwarding functions.
To make a forwarding decision, a router must compare a packet's destination IP address with entries in a forwarding table the router maintains. The way in which the forwarding table is constructed and how its contents are searched affects both the system performance and packet forwarding rates. The first generation of switches and routers used software-based forwarding (on a centralized processor) to look up the forwarding information in the forwarding table for every packet forwarded (a method Cisco Systems calls process switching).
To improve the performance of the software-based forwarding (process switching) systems, flow/route cache-based schemes were proposed. Techniques in this category are referred to by Cisco as fast switching and optimum switching. Cisco optimum switching differs from Cisco fast switching in that a router that uses optimum switching employs a more efficient tree structure in its forwarding table to enable it perform faster forwarding table lookups. Optimum switching exploits, in addition, the pipelining characteristics of the RISC processors (used on some Cisco routing platforms) to allow faster system performance.
These packet forwarding methods yield higher forwarding rates by allowing the system to forward a packet using a flow/route cache. The cache entries are created by the forwarding table lookup of the initial packet of a flow sent to a particular destination address. Destination address-to-exit port mappings are stored in the flow/route cache to allow the system to make faster forwarding information lookup and packet forwarding.
In the route cache-based systems, when a packet is copied to packet memory and the destination address is in the route cache, forwarding can be done much faster. This means that each time a packet is forwarded, it does not have to be processed using the normal forwarding table unless it is a first packet of a flow. The forwarding is done using the cache and the packet header is rewritten and sent to the outgoing port that leads to the destination. Subsequent packets of the same flow going to the same destination use the cache entry created by the first packet.
Latency and data loss have become even more important quality of service (QoS) metrics in today's networks that carry multimedia traffic. The high attention given to these metrics is due to the shift in traffic patterns in today's networks, a pattern characterized by a large number of real-time traffic (voice, video, interactive gaming, etc.) and short-lived flows (from TCP sources).
However, supporting a large number of short flows results in longer latencies in flow/cache-based switching and routing architectures, because the first packet in the flow must be processed and forwarded in software. Subsequent packets of the same flow would then be forwarded much faster using the flow cache (whose contents are created from the destination IP address-to-outgoing port mapping gleaned from the first packet forwarding process). Using this forwarding method, all routing devices on the best path to the destination must forward the first packet in the same fashion resulting in excessive delays even for a short flow.
Packet delay variation (PDV) was also another QoS metric of concern. Without decoupling the forwarding functions from the control functions on the shared CPU, variable packet processing and forwarding latencies can occur and can result in PDV. PDV is the biggest problem in applications such as voice, video, and interactive gaming. So, minimizing PDV became a significant gain that came with separating the forwarding and control functions within a device.
Switch and routing architectures evolved (and continue) to not only better address packet forwarding speeds, latency, and data loss but also to address newer and emerging concerns like system reliability and resiliency, scalability, enhanced network security with access control, efficient multicast support, energy consumption, and device footprint (office and storage space). The subsequent chapters will look at some newer architectures, starting with earlier generation ones.