The Cisco Catalyst 6500 is a family of switch/routers supporting a range of Supervisor Engine and line card options. The older generation of the Cisco Catalyst 6500 supports Supervisor Engines 1A or 2 and two backplanes. One backplane is a 32 Gb/s shared switching bus for interconnecting line cards within the switch/router, and the another backplane allows line cards to interconnect over a high-speed crossbar switch fabric.
The crossbar switch fabric provides each connecting module with a set of distinct high-speed switching paths for data transmission to and data reception from the crossbar switch fabric. This first generation switch fabric (implemented in a stand-alone switch fabric module) provides a total switching capacity of 256 Gb/s.
The newer generation Catalyst 6500 was introduced with the newer high-performing Supervisor Engines 32 and 720 that have advanced features beyond the Supervisor Engines 1A and 2 [CISC6500DS04, CISCCAT6500]. The discussion in this chapter focuses on the architectures of the Catalyst 6500 Series with Supervisor Engine 32. The Supervisor Engine 32 provides connectivity only to the 32 Gb/s shared switching bus and, as a result, supports only line cards that connect to this switching bus. These line card types are the “classic” (also called nonfabric-enabled) and CEF256-based line cards.
By supporting these line card types, the higher performing Supervisor Engine 32 provides investment protection to users who have already deployed Cisco Catalyst 6500 modules that connect to the 32 Gb/s backplane. The Supervisor Engine 32 protects current 32 Gb/s-based switch investments by supporting all existing “classic” and CEF256-based line cards.
As will be described below, the Supervisor Engine 32 has two uplink options: eight-port Gigabit Ethernet (GbE) Small Form Pluggable (SFP)-based uplinks and two-port 10 GbE XENPAK-based uplinks. The Catalyst 6500 with Supervisor Engine 32 is designed primarily for deployment at the network access layer.
Adopting the architecture categories broadly used to classify the various designs in Chapter 3, the following architectures are covered in this chapter:
The 32 Gb/s switching bus allows all the modules connected to it to share the common bandwidth available for both data transmission and data reception. As described in Chapter 7, the shared bus consists of three (sub-)buses, each playing a specific role in the data forwarding operation in the system. These buses are the Data Bus (DBus), Results Bus (RBus), and the Control Bus (CBus).
The DBus is the main system bus that carries all end-user data transmitted and received between modules. It has a bandwidth of 32 Gb/s (i.e., 2 × 256 bits wide × 62.5 MHz clock speed). The “32 Gb/s” in the name of the switching bus comes from this data transfer rate. The DBus is a shared bus, so to transmit a packet, a line card must arbitrate for access to the DBus by submitting a transmit request to a master arbitration mechanism that is located on the Supervisor Engine (or primary Supervisor Engine if a redundant Supervisor Engine is installed).
If the DBus is not busy, the master arbitration mechanism grants access permitting the line card to transmit the packet on the DBus. With this bus access permission, the line card transmits the packet over the DBus to the Supervisor Engine. During the packet transfer over the DBus, all connected line cards will sense the packet being transmitted and capture a copy of the packet into their local buffers.
The Supervisor Engine uses the RBus to forward the forwarding instructions (obtained after forwarding table lookup) to each of the attached line cards. The forwarding instruction sent by the Supervisor Engine to each line card is either a drop or forward action. A drop action means a line card should flush the packet from its buffers, and a forward action means the packet should be sent out a port to its destination. The CBus (or Ethernet out-of-band channel (EOBC)) is the bus that carries control information between the line cards and the control and management entity on the Supervisor Engine.
The Cisco Catalyst 6500 shared switching bus employs two methods to achieve improved performance over the traditional shared bus: pipelining and burst mode.
The traditional implementation of the shared bus allows a single frame transmission over the shared bus at any given point in time. Let us consider the situation where the system employs a traditional shared bus. The Supervisor Engine receives a packet from a line card and performs a lookup into its local forwarding table to determine which line card port the packet should be forwarded to. It sends the result of the forwarding table lookup to all ports connected to the shared bus over the RBus.
In the traditional implementation, while the table lookup is being performed, no subsequent packets are sent over the bus. This means there are some idle times in data transfers over the bus resulting in suboptimal use of the bus – Bus utilization is not maximized.
The Catalyst 6500 employs pipelining to allow ports to transmit up to 31 frames across the shared bus (to be pipelined at the Supervisor Engine for lookup operation) before a lookup result is transmitted via the RBus. If it happens that there is a 32nd packet to be transmitted, it will be held locally at the transmitting port until the port receives a result over the RBus. Pipelining allows the system to reduce the idle times that would have been experienced in the traditional bus implementation and also provides improvements in the overall utilization of the shared bus architecture.
Another concern in the use of traditional shared bus is that the bus usage could unfairly favor ports transmitting larger frames. Let us consider, for example, two ports that are requesting access to the shared bus. Let us assume that Port A is transmitting 512 byte frames and Port B is transmitting 1518 byte frames. Port B would gain an unfair bus usage advantage over Port A when it sends a sequence of frames over a period of time because it consumes relatively more bandwidth in the process. The Catalyst 6500 uses the burst mode feature to mitigate this kind of unfairness.
To implement the burst mode feature, the port ASIC (which handles access to the shared bus) maintains a count of the number of bytes it has transmitted and compares this with a locally configured threshold. If the byte count is below the threshold, then a packet waiting to be transmitted can be forwarded. If the byte count exceeds the threshold, then the port ASIC stops transmitting frames and bus access is removed for this port (done by the master arbitration mechanism in the Supervisor Engine). The threshold is computed by the port using a number of local variables extracted from the system (see related discussion in Chapter 7) in order to ensure fair distribution of bus bandwidth.
The Supervisor Engine is the main module in the Catalyst 6500 responsible for all centralized control plane and data plane operations. The control plane is responsible for running the routing protocols and generating the routing table that contains the network topology information (location of each destination IP address in the network). Each destination IP address in the routing table is associated with a next hop IP address, which represents the next closest Layer 3 device or router to the final destination. The contents of the routing table are distilled into a much compact and simple table called the forwarding table.
The data plane is responsible for the operations that are actually performed on a packet in order to forward it to the next hop. These operations involve performing a forwarding table lookup to determine the next hop address and egress interface, decrementing the IP TTL, recomputing the IP checksum, rewriting the appropriate source and destination Ethernet MAC addresses in the frame, recomputing the Ethernet checksum, and then forwarding the packet out the appropriate egress interface to the next hop. Control plane functions are typically handled in software, whereas data plane functions are simple enough to be implemented in hardware, if required.
The Supervisor Engine 32 has connectivity only to the 32 Gb/s shared bus and supports packet forwarding of up to 15 Mpps [CISCSUPENG32]. Unlike the Supervisor Engines 2 and 720, Supervisor Engine 32 does not provide connectivity to a crossbar switch fabric, as illustrated in Figures 9.3 and 9.4. As shown in these figures, Supervisor Engine 32 supports the PFC3B and MSFC2a as a default configuration.
The Supervisor Engine 32 comes in two versions:
These Supervisor Engines also support an additional 10/100/1000TX front port and two USB ports on the front panel (type A and type B USB port). The type A USB port, which is designated for host use, can be used to plug in devices such as a laptop or PC. The type B is designated as a device port and can be used for attaching devices such as a Flash memory key.
A chassis that supports redundancy with two Supervisor Engine 32-8GE modules provides a total of 18 active Gigabit Ethernet ports (i.e., 2 × (8 +1) ports) to the user, where all ports on both the primary and secondary Supervisor Engines are active.
The MSFC2a provides Layer 3 control plane functionality that enables the Supervisor Engine 32 to function as a full-fledged Layer 3 device. Without the MSFC2a, the Supervisor Engine 32 will function purely as a Layer 2 device. Forwarding using network topology-based forwarding tables and optimized lookup algorithms (called CEF (Cisco Express Forwarding)) is the forwarding architecture implemented in the Supervisor Engine 32. The MSFC2a and the MSFC2 used in the Supervisor Engine 2 (see Chapter 7) are functionality equivalent, except the MSFC2a uses a bigger DRAM.
In the Supervisor Engine 32, a Switch Processor CPU, as shown in Figures 9.3 and 9.4, is responsible for running all the Layer 2 control plane protocols, such as Spanning Tree Protocol (STP), IEEE 802.1AB Link Layer Discovery Protocol (LLDP), and VLAN Trunking Protocol (VTP). The Switch Processor is allocated its own (upgradeable) DRAM and nonvolatile RAM (NVRAM).
A route processor CPU on the MSFC2a (Figures 9.3 and 9.4) is responsible for running the Layer 3 routing protocols and ICMP, carrying out address resolution functions to map IP addresses to Layer 2 addresses, initializing and managing switched virtual interfaces (SVIs), and running and configuration of the Cisco IOS Software. An Ethernet out-of-band control bus (a full duplex, 1 Gb/s in-band connection) shown in Figures 9.3 and 9.4) enables the MSFC2a to communicate and exchange information with other entities on the Supervisor Engine 32 baseboard.
The MSFC2a communicates with its Layer 3 peers in a network (via the configured routing protocols (Open Shortest Path First (OSPF), Enhanced Interior Gateway Routing Protocol (EIGRP), border Gateway Protocol (BGP), etc.) and generates routing information about the network topology (which is maintained in the routing table). The MSFC2a then distills this routing information to generate a more compact forwarding table or FIB, which is then forwarded to the PFC3B.
The PFC3B stores the forwarding table in a FIB ternary content addressable memory (TCAM). As shown in Figures 9.3 and 9.4, the FIB TCAM is implemented on the PFC3B daughter card and is a very high-speed memory that allows the PFC3B to perform fast forwarding table lookups during packet forwarding. The Layer 3 features that are supported on a Supervisor Engine 32 are described in more detail in [CISCSUPENG32].
As with the Supervisor Engine 720, the Supervisor Engine 32 also implements hardware counters, registers, and control plane policing (CoPP) (of control plane traffic) to limit the effect of denial of service (DoS) attacks on the control plane. The hardware-based control plane policing allows a control plane quality of service (QoS) policy to be applied to the Supervisor Engine 32 in order to limit the total amount of traffic that is sent to its control plane.
The PFC3B (Figures 9.3 and 9.4) supports hardware-based features that allow the Supervisor Engine 32 to perform more enhanced QoS and security operations. For example, to secure and prioritize data, the PFC3B provides hardware support for security and QoS-based ACLs using Layer 2, 3, and 4 classification criteria. The PFC3B also allows the Supervisor Engine 32 to support new hardware accelerated features such as ACL hit counters, port access control lists (PACLs), Enhanced Remote Switched Port Analyzer (ERSPAN), CPU rate limiters, IP Source Guard, and NetFlow capacities:
Other QoS services supported on the PFC3B include ingress traffic policing and classification of incoming data. This allows the rewrite of IEEE 802.1p class of service (CoS) bits in the Ethernet header and IP Precedence/DSCP priority bits in the IPv4 header.
The Supervisor Engine 32 is also referred to as a “Classic” module (see Chapter 7) because it supports connectivity only to the “Classic” 32 Gb/s shared switching bus that allows it to communicate with other line cards connected to the bus. The Supervisor Engine 32 also has no built-in crossbar switch fabric (as in other Supervisor Engines like Supervisor Engine 720), nor does it support connectivity to a separate crossbar switch fabric module (like in Supervisor Engine 2).
The support of only the Classic (32 Gb/s) shared bus thus dictate the type of line cards that can operate with the Supervisor Engine 32. Line cards that do not support connectivity over the Classic 32 Gb/s shared bus cannot be used with the Supervisor Engine 32.
A full list of the line card architectures supported with the Supervisor Engine 32 is given in [CISCSUPENG32]. As explained below, the Supervisor Engine 32 supports both the CEF256 and Classic line card architectures. Both of these line cards have a connector on the line card that provides connectivity to the Classic 32 Gb/s bus.
The 32 Gb/s shared bus allows all ports connected (on both the Supervisor Engine 32 and line cards) to exchange data. The DBus is 256 bits wide and is clocked at 62.5 MHz, which yields bandwidth of 16 Gb/s. The RBus also operates at 62.5 MHz and is 64 bits wide.
The Supervisor Engine 32 baseboard (Figures 9.3 and 9.4) has a number of onboard application-specific integrated circuits (ASICs) that enable the support of Layer 2, 3, and 4 services and also serves as an interface to the 32 Gb/s shared switching bus. One ASIC is used to connect the Supervisor Engine to the 32 Gb/s shared switching bus. This specialized ASIC (referred to as the Fabric ASIC and Replication Engine) is also used for multicast packet replication in the Supervisor Engine 32 and supports the SPAN functionality used for port mirroring.
The Supervisor Engine 32 via the Fabric ASIC and Replication Engine supports only ingress replication mode of multicast packets. As shown in Figures 9.3 and 9.4, the Fabric ASIC and Replication Engine also provides an interface to the multicast expansion table (MET), which supplies the Supervisor Engine 32 with the relevant information regarding the multicast group membership it serves. Another onboard port ASIC holds the port interface logic that provides connectivity to the 9 GbE ports (Figure 9.3) or the two 10 GbE ports (Figure 9.4).
The Cisco Catalyst 6500 supports a number of line card types with different physical media types and speed options. These line cards are designed with a range of features to allow the Catalyst 6500 to meet the needs of deployment in the access, distribution, and core layers of a network. A line card slot may provide a connection to the 32 Gb/s shared bus and in some designs, another connection to a crossbar switch fabric if either a Supervisor Engine 2 or 720 is present.
The Catalyst 6500 supports four general line card types, that is, the Classic, CEF256, CEF720, dCEF256, and dCEF720 line cards [CISCCAT6500]. All of these line cards can interoperate and communicate with each other when installed in the same chassis as long as the relevant fabric connections are present in the chassis. The Catalyst 6500 with Supervisor Engine 32 supports only the line cards that have connectivity to the 32 Gb/s shared bus – the Classic and CEF256 line cards.
The Classic line card (also called the nonfabric-enabled line card (see discussion in Chapter 7)) supports connectivity only to the 32 Gb/s shared switching bus. It has a shared bus connection only and no connection to a stand-alone or Supervisor Engine integrated crossbar switch fabric. Furthermore, it does not support packet forwarding locally in the line card (i.e., distributed forwarding).
All generations and versions of the Supervisor Engines, from the Supervisor Engine 1A through to the newer Supervisor Engine 720-3BXL, support the Classic line cards. A Classic line card when installed in a Cisco Catalyst 6500 chassis does not allow the line cards to operate in compact (switching) mode (see bus switching modes discussion below). Thus, with the presence of this line card, the centralized forwarding rate of the PFC3B reaches only 15 Mpps.
Figure 9.5 shows the architecture of the CEF256 line card. The CEF256 line card is a fabric-enabled line card and supports one connection to the 32 Gb/s shared switching bus and another connection to the crossbar switch fabric [CISCCAT6500]. The connection to the crossbar switch fabric is a single 8 Gb/s fabric channel.
The line card also supports a single internal 16 Gb/s local shared switching bus over which local packets are forwarded. The 16 Gb/s local shared switching bus has a similar function and operation as the main chassis 32 Gb/s shared bus. The chassis 32 Gb/s shared bus is the main bus that connects all shared switching bus capable line cards (i.e., the nonfabric-enabled and fabric-enabled line cards) in the Cisco Catalyst 6500 chassis.
The 16 Gb/s local switching bus on the CEF256 line card is utilized for forwarding packets that have port destinations local within the line card. Using this bus, a packet that is to be forwarded locally (utilizing an optional DFC or DFC3a to determine the forwarding destination) avoids being sent over the 32 Gb/s shared bus or the crossbar switch fabric. This local forwarding capability reduces the overall latency of forwarding packets and frees up the chassis 32 Gb/s shared bus or crossbar switch fabric capacity for those line cards that cannot forward packets locally.
As shown in Figure 9.5, the CEF256 line card supports internally a fabric Interface ASIC, which serves as the interface between the local ports on the line card and other modules connected to the crossbar switch fabric. The fabric Interface ASIC also allows line card to connect to the 32 Gb/s shared switching bus.
The CEF256 line cards will use the crossbar switch fabric for forwarding packets to other modules when it is installed in a chassis with a Supervisor Engine 720. However, if a Supervisor Engine 32 is installed, the system will fall back to using the 32 Gb/s shared switching bus since the Supervisor Engine 32 supports connectivity only to the 32 Gb/s shared switching bus.
We describe here the three switching modes used by the 32 Gb/s shared switching bus and fabric interface ASICs in the CEF256 and CEF720 line cards [CISCCAT6500]. The switching modes define the format of the internal packet header or forwarding tag used to transfer data across the DBus (of the 32 Gb/s shared switching bus) and also communicate with other CEF256 and CEF720 modules in the chassis. These switching modes do not apply to line cards that support Distributed Forwarding Card (DFC) feature.
The Flow-Through mode of operation is used by the CEF256 (fabric-enabled) line cards when a crossbar switch fabric is not present in the chassis. This mode enables CEF256 line cards to operate in the system and over only the 32 Gb/s shared bus as if they were Classic (nonfabric-enabled) line cards. This mode does not apply to the dCEF256, CEF720, and dCEF720 line cards because they do not support connectivity to the 32 Gb/s shared bus.
In this mode, the whole (original) packet (i.e., the original packet header and data) is forwarded by the CEF256 line card over the 32 Gb/s shared bus to the Supervisor Engine for forwarding table lookup and forwarding to the destination port. In the flow-through mode, the Catalyst 6500 achieves a centralized forwarding rate of up to 15 Mpps.
For a system to operate in the compact mode, the system must support a crossbar switch fabric in addition to the 32 Gb/s shared switching bus. The crossbar switch fabric can be realized in the form of a stand-alone switch fabric module (installed in a chassis slot) or a Supervisor Engine 720 (which has an integrated crossbar switch fabric). The line cards in the chassis must all be fabric enabled (i.e., CEF256) for the system to run in compact mode.
Classic line cards (which do not have a crossbar switch fabric connection), when installed in the chassis, will not allow the system to operate in compact mode. Note that the dCEF256, CEF720, and dCEF720 line cards do not have connections to the 32 Gb/s shared bus.
In compact mode, a line card will send only the (original) packet header over the DBus of the 32 Gb/s shared bus to the Supervisor Engine for processing. To conserve DBus bandwidth and to allow for faster header transmission, the original packet header is compressed before to being transmitted on the DBus. The line card transmits the data portion of the packet over the crossbar switch fabric channels to the destination port. In this mode, the system achieves (independent of packet size) a centralized forwarding rate of up to 30 Mpps.
The truncated mode is used when the chassis has the following three module types present: a crossbar switch fabric, CEF256 and/or CEF720 line cards, and Classic line cards. When operating in this mode, the Classic line cards will forward over the DBus of the 32 Gb/s shared bus to the Supervisor Engine, the header, plus the data portion of the (original) packet. The CEF256 and CEF720 line cards, on the other hand, will forward the packet header over the DBus and the data portion over the crossbar switch fabric.
In the truncated mode, the system achieves a centralized forwarding rate up to 15 Mpps. Furthermore, in this mode, because the CEF256 and CEF720 line cards transmit the data portion of the packet over the crossbar switch fabric, the overall aggregate bandwidth achieved can be higher than the 32 Gb/s shared switching bus capacity. However, the performance of line cards that have the DFC feature is not affected by the truncated mode – the forwarding performance stays the same and does not change regardless of the line card mix in the system.
We review in this section the queue structures and QoS features on the uplink ports (Figures 9.3 and 9.4) of the Supervisor Engine 32 [CISCSUPENG32].
The transmit side of each Gigabit Ethernet uplink port (see Figure 9.3) is assigned a single strict priority queue and three normal (lower priority) queues. Each of these normal transmit queues supports eight queue fill thresholds, which can be used with a port congestion management algorithm for congestion control. The receive side is assigned two normal queues, each with eight queue fill thresholds for congestion management. There receive side has no strict priority queue.
Each of the Ethernet uplink ports on the Supervisor Engine 32 Gigabit (Figure 9.3) is allocated 9.5 MB of per port buffering. The 10 GbE ports (Figure 9.4), on the other hand, are assigned 100 MB of per port buffering. The provision of large per port buffering is of particular importance when the switch is operating in networks that carry bursty applications or high data volume applications (e.g., long flow TCP sessions, network video, etc.). With large buffering per port, these applications can use the extra buffering should the data transfers become very bursty.
Both the Supervisor Engine 32 and Supervisor Engine 720 support a feature called differentiated services code point (DSCP) transparency. DSCP transparency is a feature that allows the switch to maintain the integrity of the DSCP bits carried in a packet as it transits the switch. Let us consider the situation where a packet arrives on a switch port carrying traffic that is not trusted (an untrusted port) and the switch assigns a lower class-of-service (CoS) value to the packet.
From this incoming CoS value, the switch derives an internal priority value that is used to write the DSCP bits on egress. DSCP transparency prevents this situation, and similar ones, by not allowing the switch to use the internal priority to derive the egress DSCP value. Instead, the switch will simply write the ingress DSCP value on egress.
When DSCP transparency is not used, the DSCP field in an incoming packet will be modified by the switch, and the DSCP field in the outgoing packet will be modified based on the port QoS settings that may include the policing and marking policies configured, port trust level setting, and the DSCP-to-DSCP mutation map configured at the port.
On the other hand, if DSCP transparency is used, the DSCP field in the incoming packet will not be modified by the switch, and the DSCP field in the outgoing packet will not be modified and stays unchanged – The value is the same as that in the incoming packet. It is worth noting that regardless of whether DSCP transparency is used or not, the switch will still use an internal DSCP value for the packet, which it uses for internal traffic processing to generate a CoS value that reflects the priority of the traffic. The internal DSCP value is also used by the switch to select an egress queue and queue fill threshold for the outgoing packet.
Two important scheduling mechanisms that can be used on the Supervisor Engine 32 GbE uplink ports (Figure 9.3) are the shaped round-robin (SRR) and deficit weighted round-robin (DWRR) algorithms. SRR allows the maximum amount of bandwidth that each queue is allowed to use to be defined. SRR like DWRR requires a scheduling weight to be configured for each of the queues, but the weight values are used differently in SRR.
After the scheduling weights are assigned to all queues, the SRR algorithm normalizes the total of the weights to 1 (or equivalently, 100%). Then a maximum bandwidth value is derived from the normalized values and assigned to each queue. The flow of data out of the queue will then be shaped to not exceed this (maximum) bandwidth value. But unlike DWRR, each queue that is shaped will not be allowed to exceed the maximum bandwidth value computed from the normalized weights. With SRR, traffic in excess of the maximum bandwidth value will be buffered and scheduled resulting in the traffic appearing to have a smoothing output over a given period of time.
The DWRR algorithm, on the other hand, aims to provide a fairer allocation of bandwidth between the queues than when the ordinary weighted round-robin (WRR) is used. The weights in DWRR determine how much bandwidth each queue is allowed to use, but, in addition, the algorithm maintains a measure or count of excess bandwidth each queue has used.
To understand the DWRR algorithm, let us consider, for example, a queue that has used up all but 500 bytes of its allocation, but has another packet in the queue that is 1500 bytes in size. When this 1500 packet is scheduled, the queue has consumed 1000 bytes of bandwidth in excess of its allocation on that scheduling round. The DWRR algorithm works by recognizing that an extra 1000 bytes has been used and deducts this (excess bytes) from the queue's bandwidth allocation in the next scheduling round. When the operation of the DWRR algorithm is viewed over a period of time, all the queues will on average be served closer to their allocated portion of the overall bandwidth.
This section describes how a packet is forwarded through the Supervisor Engine 32. The steps involved in forwarding packets over the shared bus are described below and are also marked in Figures 9.6 and 9.7.