10
Architectures with Shared-Memory-Based Switch Fabrics: Case Study—Cisco Catalyst 8500 CSR Series

10.1 Introduction

The Catalyst 8500 campus switch/routers (CSRs) are modular networking devices that provide wire speed Layer 2 and Layer 3 packet forwarding and services [CISCCAT8500]. The Catalyst 8500 family of devices comprises the Catalyst 8510 and 8540 switch/routers. This family of high-speed switch/routers is targeted for campus or enterprise backbones.

The Catalyst 8500 family of devices support both IP and IPX routing standards but the focus of this chapter will be on the IP routing features. The IPX protocol has been deprecated and is no more in use. Based on architecture categories described in Chapter 3, the architectures discussed here fall under “architectures with shared-memory-based switch fabrics and distributed forwarding engines” (see Figure 10.1).

Figure depicts architectures with shared-memory-based switch fabrics and distributed forwarding engines.

Figure 10.1 Architectures with shared-memory-based switch fabrics and distributed forwarding engines.

10.2 Main Architectural Features of the Catalyst 8500 Series

The key features of the Catalyst 8500 CSR include wire-speed Layer 3 IP unicast and multicast forwarding over 10/100 Mb/s Ethernet and Gigabit Ethernet (GbE) interfaces. The switch/router supports the configuration of virtual LANs (VLANs) between switches via the IEEE 802.1Q standard and Cisco Inter-Switch Link (ISL) trunking protocol (a Cisco proprietary protocol for encapsulating Ethernet frames with VLAN information).

The switch/router also supports a number of quality of service (QoS) features, including four priority queues per port, and flow classification and priority queuing based on IP Precedence bit settings. The main architectural features of the Catalyst 8510 and 8540 are discussed in this chapter.

10.2.1 Catalyst 8510

The Catalyst 8510 family supports wire-speed IP packet nonblocking forwarding on all ports and forwarding rates of up to 6 million packets per second (Mpps). The Catalyst 8510 employs a five-slot modular chassis that can carry up to 32 10/100 Mb/s Ethernet ports or 4 GbE ports (that can be used for uplink connectivity to a network or to servers). The modular chassis also supports two fault-tolerant, load-sharing power supplies (primary and optional secondary).

The Catalyst 8510 switch/router supports a 10 Gb/s full nonblocking shared memory switch fabric for both Layer 2 and Layer 3 forwarding. This shared memory switch fabric allows for an aggregate packet throughput of 6 Mpps. The central slot in the five-slot chassis is dedicated to the shared memory switch fabric and a high-performance Switch Route Processor (SRP) (a RISC processor).

Furthermore, the Catalyst 8510 supports two line card types. One has eight 10/100 Mb/s Ethernet ports with Category 5 copper cable and 8P8C (8 position 8 contact), also called T568A/T568B) connectors (but commonly referred to as RJ-45). The second has eight 100BASEFX Ethernet ports with fiber-optic cable and SC fiber connectors.

The SRP module runs the Layer 2 and Layer 3 protocols that provide required intelligence in the Catalyst 8510 for switching and routing packets. The SRP interfaces to each port in the switch/router via the shared memory switch fabric. The switching and routing protocols SRP are implemented as part of the Cisco Internetwork Operating System (IOS) software commonly used in Cisco switching and routing devices. The SRP is responsible for running the routing protocols including the multicast protocols, and the generation and maintenance of the distributed forwarding tables that reside in the line cards.

The SRP also supports SNMP agents and MIBs used for the management of the switch/router. Other features implemented in the SRP are advanced packet classification and management applications used for traffic management. The SRP is carried in the middle slot in the five-slot chassis, while the remaining four slots are used for line card modules. In addition to supporting redundant power supply modules, the Catalyst 8510 supports field replaceable (i.e., hot-swappable) fan trays while the switch/router is operational, thus reducing the mean time to repair.

The SRP module supports two PCMCIA Type II slots into which a variety of Flash EPROM modules can be fitted. These EPROMs introduce 8 –20 MB of additional memory to the SRP. The EPROM modules allow the SRP to support larger Cisco IOS software code images as the IOS software is updated and grows. The Flash EPROMs can also be used to program standard configuration parameters for the Catalyst 8510. The EPROMs, however, are not required for the running mode operation of the switch/router, but mostly as a boot EPROM.

10.2.2 Catalyst 8540

The Catalyst 8540 switch/router has a 13-slot chassis and a 40 Gb/s nonblocking shared memory switch fabric. This switch/router can forward packets at rates up to 24 Mpps. Two switch fabric modules are required to hold the shared memory and allow for the transporting of packets from one switch/router interface to another. Similar to the Catalyst 8510, the Catalyst 8540 also supports Layer 2 and Layer 3 forwarding of packets and other IP services.

The Catalyst 8540 also supports two line card types. One has 16 10/100 Mb/s fast Ethernet ports over copper and the other 16 100Base-FX ports over fiber. Each line card can have either 16,000 or 64,000 forwarding table entries. This translates into 16,000 or 64,000 Layer 2 MAC addresses or IP addresses, or a combination of both address types. These addresses can be stored locally in a forwarding table within a line card to be used for Layer 2 and 3 packet forwarding. The Catalyst 8540 also supports a number of system redundancy features, which include redundant switch fabrics, system processors, and power supply modules.

The Layer 2 and 3 routing information required for forwarding packets in the Catalyst 8540 is provided by two separate processor modules. One processor module consists of two processors each referred to as a switch processor (SP), while the other module has a single processor called the route processor (RP). The RP module supports the main system processor (which includes a network management processor that runs the system management software), and also a larger portion of the system memory components.

The RP is the processor responsible for executing the system management functions that configure and control the switch/router. In addition to having a high-power microprocessor, the RP supports the following features:

  • Three main system memory components:
    • - Two DRAM SIMMs that maintain the queues used for storing incoming and outgoing packets. The DRAMs also hold caches required for the system.
    • - One Flash memory SIMM (EPROMs) for storing the Cisco IOS software image. The default memory is 8 MB and is upgradeable to 16 MB.
    • - Two Flash PC card slots for creating additional Flash memory for storing the configuration information and system software.
  • Temperature sensor that allows for the monitoring of the internal system environment.
  • Console port that can be used to connect a terminal or a modem to the switch/router for system configuration and management.
  • 10/100 Mb/s Ethernet port that can be used to connect the switch/router to a management device with an Ethernet interface or an SNMP management station.

In addition to these features, the RP performs the following management functions:

  • Monitoring the switch/router interfaces and the environmental status of the whole system.
  • Providing SNMP management and the console (Telnet) interface used for system management.

The RP, like the SRP on the Catalyst 8510, runs the Cisco IOS software that implements the unicast and multicast routing protocols and constructs and maintains the distributed forwarding tables used in the line cards. The SNMP agents and the MIBs used for the switch/routers management, as well as the advanced management applications used for traffic management, run in the RP.

The SP, on the other hand, runs the Layer 2 control plane protocols such as Spanning Tree Protocol (STP), IEEE 802.1AB Link Layer Discovery Protocol (LLDP), and VLAN Trunking Protocol (VTP). Together, the combined functions of the two SPs and the RP in the Catalyst 8540 are logically equivalent to the functions of the SRP in the Catalyst 8510.

The two Catalyst 8540 SPs take up two slots in the 13-slot chassis, with a third slot reserved for a redundant SP. If any one of the two SPs fails, the third redundant SP will take over. One slot in the chassis is reserved for the RP, which is responsible for running system management and control plane software. A second slot is reserved for a redundant RP. With five slots taken up by the SPs and RP and their redundant processors, the remaining eight slots in the Catalyst 8540 are used for line card modules.

10.3 The Switch-Route and Route Processors

The Catalyst 8540 employs a 40 Gb/s, shared memory switch fabric, while the Catalyst 8510 employs a 10 Gb/s shared memory fabric. These shared memory switch fabrics allow for full nonblocking transfer of data between ports and system modules. The switch/router ports include 10/100 Mb/s Ethernet, Gigabit Ethernet, and 155 Mb/s/622 Mb/s ATM ports.

The Catalyst 8500 CSR has a distributed forwarding architecture where all line cards in the system can forward packets locally. The system processor (SRP or RP) ensures that the Layer 2 and Layer 3 forwarding information in the line cards are up-to-date. The forwarding tables in the line cards are updated and kept synchronized with the master forwarding table (in the SRP or RP) whenever routing and network topology changes occur.

The system processor (i.e., the SRP in the Catalyst 8510 and RP and SPs in the Catalyst 8540) is the entity responsible for managing almost all aspects of the system operation. It is responsible for running all the routing protocols, constructing and maintaining the routing tables from which the forwarding table (also called forwarding information base (FIB)) is generated and distributed to the line cards. The SRP and the SP are responsible for Layer 2 MAC address learning and distribution to the line cards. The SRP and RP (in the Catalyst 8540) are also responsible for system management and configuration.

Figure 10.2 shows a high-level view of the Catalyst 8500 switch/router architecture. It should be noted that, despite the shared memory bandwidth and system processor differences between the Catalyst 8510 and Catalyst 8540, they have identical functions. The SP and RP in the Catalyst 8540 can be viewed as a single logical processor with functions similar to the SRP in the Catalyst 8510.

img

Figure 10.2 High-level architecture of the Catalyst 8500 CSR.

The system processor is responsible for all Layer 2 address learning, Layer 3 route determination, and distribution to the line cards. Given that the Catalyst 8500 is designed as a distributed forwarding system, the system processor must ensure that all Layer 2 and Layer 3 addresses and routes are maintained and updated in the line cards as network changes occur. The system processor is also responsible for all system management, including SNMP, remote monitoring (RMON), and statistics collection.

The system processor runs a number of protocols such as Hot Standby Router Protocol (HSRP), which is a Cisco proprietary protocol (whose standards-based alternative is Virtual Router Redundancy Protocol (VRRP)) for establishing a redundancy fault-tolerant default gateway for a network, Protocol Independent Multicast (PIM) for multicast routing, and a full set of routing protocols, including Routing Information Protocol (RIP), RIP version 2, Open Shortest Path First (OSPF), Interior Gateway Routing Protocol (IGRP), enhanced IGRP, and Border Gateway Protocol -4 (BGP-4).

10.4 Switch Fabric

The Catalyst 8510 employs a 3 MB shared memory with 10 Gb/s of total system bandwidth. The Catalyst 8540 uses a 12 MB shared memory with 40 Gb/s of total system bandwidth. Each architecture is completely nonblocking, allowing all input ports to have equal and full access into the shared memory for packet storage and forwarding. The shared memory is also dynamic, allowing packets stored in memory to consume as much memory dynamically as they need.

Access to the shared memory (for writes and reads) is dynamically controlled by a direct memory access (DMA) ASIC. Given that the shared memory switch fabric is nonblocking, the switch/router does not have to use per-port buffering. This is because the shared memory fabric bandwidth is greater than the combined bandwidth of all the system ports.

Since the architecture is logically an output buffered one, congestion only occurs at an individual output port when the port's resources are oversubscribed. The Catalyst 8500 also supports four priority queues per port and an output port Frame Scheduler that services the output queues based on the priority of each queue.

Each of the line card modules fits into a chassis slot and connects to the Catalyst 8500 shared memory switch fabric. In the Catalyst 8510, each line card is allocated 2.5 Gb/s of the shared memory bandwidth (as shown in Figure 10.3). This allocated bandwidth allows for nonblocking data transfer from any port since each slot is given bandwidth larger than the sum of the bandwidth of all of the ports on the line card.

img

Figure 10.3 Switching bandwidth per slot on Catalyst 8510.

The 2.5 Gb/s bandwidth allocated to a slot is divided into two: with 1.25 Gb/s allocated to the transmit path and 1.25 Gb/s to the receive path. This ensures that writes and reads to the shared memory can be done in a nonblocking manner, independently, and simultaneously. In the Catalyst 8540, each slot is allocated 5 Gb/s into the shared memory fabric (Figure 10.4). This bandwidth is also divided into 2.5 Gb/s for the transmit and 2.5 Gb/s for receive path to the shared memory.

img

Figure 10.4 Switching bandwidth per slot on Catalyst 8540.

Each packet written into the shared memory has an internal routing tag prepended to it. This internal routing tag provides the shared memory switch fabric with the right information to enable it to internally route the packet to the correct destination port(s). The routing tag carries information about the destination port(s), the packet's destination port QoS priority queue, and the packet's drop priority.

A Fabric Interface ASIC (see Figure 10.5) writes the arriving packet into memory after a forwarding decision has been made. The Interface ASIC creates a pointer indicating the memory location the packet is stored and generates the internal routing tag that is prepended to the packet and carries the appropriate destination port(s).

Figure depicts catalyst 8500 line card architecture.

Figure 10.5 Catalyst 8500 line card architecture.

The output port Frame Scheduler examines the output port priority queues and then schedules the queued packets out of shared memory using a strict priority, weighted fair queuing (WFQ), weighted round-robin (WRR), or any configured scheduling mechanism (see discussion in the following section).

10.5 Line Cards

Each line card contains forwarding ASICs designed to provide interfaces to the shared memory switch fabric, as well as, maintain Layer 2 and Layer 3 forwarding tables. These forwarding tables allow the Catalyst 8500 system to make forwarding decisions at wire speeds before a packet is written into the shared memory.

The system processor is responsible for ensuring that the forwarding tables in the line cards are up-to-date whenever network and routing changes occur. The line cards are also responsible for preparing arriving packets for efficient storage (e.g., segmentation, tagging) in the shared memory switch fabric, QoS policy enforcement, and packet forwarding to the external network.

Figure 10.5 shows the architecture of the Catalyst 8500 line cards. The Catalyst 8500 uses a distributed forwarding architecture where the line cards are equipped with the right forwarding information to make both Layer 3 and Layer 2 forwarding decisions locally at wire speed, as well as enforce QoS and security filtering policies. The distributed forwarding engine ASIC in the line card is responsible for the Layer 2 and 3 address lookups in the CAM table (Figure 10.5), and forwarding of the packet along with its correct Layer 2 address rewrite information to the Fabric Interface. The Fabric Interface (also implemented on the line card) and is responsible for rewriting the Layer 2 addresses in the departing Layer 2 frame carrying the processed packet, QoS classification, and presentation of QoS priority queuing information to the Frame Scheduler.

Each distributed forwarding engine ASIC is assigned four ports on the line card to service. This means two forwarding engine ASICs are required per line card to service eight ports. On the Catalyst 8540, four forwarding engine ASICs are required to service 16 ports. The forwarding engine ASIC also handles all MAC layer functions. The MAC in the 10/100 Mb/s Ethernet ports can run in either full or half duplex mode and is auto-sensing and auto-negotiating, if configured. The distributed forwarding engine ASIC has several key components (Figure 10.5) that are discussed in detail in the following sections.

10.5.1 Internal Memory

Packets arriving at a switch/router port are handled by the Ethernet MAC functions, and then stored in an Internal Memory (Figure 10.5) in the distributed forwarding engine ASIC. This memory consists of a block of SRAM and is 8 kB in total size out of which 2 kB is reserved and used for command instructions. The remaining 4 kB memory is used to store the arriving packet while it waits for the necessary Layer 2 or 3 forwarding table lookup operations to take place.

10.5.2 Microcontroller

The microcontroller in the forwarding engine ASIC is a small processor (mini-CPU) that is used locally to process packets coming from four ports on the line module. The microcontroller supports mechanisms that will allow it to process the arriving packets from the four ports in a fair manner. The scheduling mechanism responsible for the arriving packets ensures that they all have equal access to the Internal Memory.

The forwarding engine ASIC also ensures that forwarding table lookups via the Search Engine are done in a fair manner when the four ports arbitrate for lookup services. Access to the Search Engine is done in a round-robin manner, controlled by the microcontroller that cycles through each port, processing lookup requests as they are submitted.

The microprocessor is also responsible for forwarding special system packets such as routing protocol messages, Address Resolution Protocol (ARP) frames, Spanning Tree BPDUs, Cisco Discovery Protocol (CDP) packets, and other control-type packets to the system processor (SRP, SP, and RP). These special and exception packets are forwarded by the forwarding engine ASIC directly to the system processor.

CDP is a proprietary Cisco Layer 2 protocol that allows Cisco devices to exchange information on the MAC addresses, IP addresses, and outgoing interfaces they support. This feature allows a network manager to have a view of the devices in a network and the interfaces and addresses they support, and also troubleshoot potential network problems.

10.5.3 CAM and Search Engine

The forwarding engine ASIC's Search Engine is responsible for performing the forwarding table lookups to determine correct output port(s) to which a packet should be forwarded. The forwarding tables used by the Search Engine for lookups are stored in a content-addressable memory (CAM), which can store either 16,000 or 64,000 entries as explained earlier.

The Search Engine handles both Layer 2 and Layer 3 forwarding table lookups. It is responsible for maintaining the Layer 2 and Layer 3 forwarding tables (which are in turn generated by the system processor (SRP, SP, and RP). Also, using a hardware-based Access Control List (ACL) feature card (to be discussed later), the Search Engine is capable of performing Layer 4-based lookups based on some Layer 4 fields/information in the packet.

When an arriving packet is being written into the Internal Memory and as soon as the first 64 bytes of the packet are written into the memory, the microcontroller passes to the Search Engine the relevant source and destination MAC address and destination IP address (plus if necessary, the packets Layer 4 information).

The Search Engine then uses these extracted packet parameters to perform a lookup in the forwarding tables in the CAM for the corresponding forwarding instructions. The Search Engine uses a binary tree lookup algorithm to locate the output port corresponding to the packets destination MAC address (for Layer 2 forwarding) or the longest network prefix that matches the destination IP address. The Search Engine also retrieves the corresponding MAC address rewrite and QoS information (which is also maintained in the CAM), and forwards this to the control FIFO (Figure 10.5) of the Fabric Interface.

The CAM is designed to have two storage options: one supporting 16,000 entries and the other 64,000 entries. After using the binary tree algorithm to perform the lookup to locate the correct forwarding information, the Search Engine sends the relevant forwarding and MAC address rewrite information to the Fabric Interface for delivery to other processing components. The Layer 2 or 3 lookup provides the forwarding engine ASIC with the destination port for the packet. The packet is then transferred across the shared memory switch fabric to the destination port.

10.5.4 Fabric Interface

After the forwarding table lookup, the Fabric Interface prepares the packet to be sent across the shared memory switch fabric to the destination port. The Fabric Interface has two main components: a control FIFO and a frame FIFO. The Internal Memory of the forwarding engine ASIC is directly connected to the frame FIFO (Figure 10.5), while the Search Engine is directly connected to the control FIFO.

As soon as the Search Engine performs the lookup in the forwarding tables (stored in the CAM), the packet is transferred from Internal Memory to the frame FIFO. In parallel, the MAC address rewrite and QoS information for the packet are forwarded by the Search Engine to the control FIFO.

The Fabric Interface then decrements the IP TTL, recomputes the IP checksum, rewrites the source and destination MAC address in the frame carrying the packet, and recomputes the Ethernet frame checksum. The Fabric Interface prepends an internal routing tag to the packet. This routing tag contains the destination port, the QoS priority queue, and packet discard priority to the packet.

Once completed, the Frame Interface signals the Frame Scheduler to write the packet into the shared memory. At the destination port, the Fabric Interface transfers the packet to its output Ethernet MAC entity for transmission to the external network. Since all relevant processing (IP TTL decrement, IP checksum calculation, MAC address rewrite, Ethernet checksum) has already been performed at the ingress port, no additional processing is required on the packet.

10.6 Catalyst 8500 Forwarding Technology and Operations

The Catalyst 8500 supports a distributed forwarding architecture that uses a forwarding approach where the forwarding information generated by the central system processor is distributed to the individual line card modules to enable them forward packets locally. Distributed forwarding in the line cards results in very high-speed forwarding table lookups and forwarding. This approach provides higher forwarding performance and scalability that is suitable for service provider networks and large campus and enterprise core networks.

10.6.1 Forwarding Philosophy

Some Layer 3 forwarding methods are based on a route/flow cache model where a fast lookup cache is maintained for destination network addresses as new flows go through the system. The route/flow cache entries are traffic driven, in that the first packet of a new flow (to a new destination) is Layer 3 forwarded by the system processor via software-based lookup, and as part of that forwarding operation, an entry for that destination is added to the route/flow cache.

This process allows subsequent packets of the same flow to be forwarded via the more efficient route/flow cache lookup. The route/flow cache entries are periodically aged out to keep the route/flow cache fresh and current. The cache entries can also be immediately invalidated if network topology changes occur.

This traffic-driven approach of maintaining a very fast lookup cache of the most recent destination address information is optimal for scenarios where the majority of traffic flows are long flows. However, given that traffic patterns at the core of the Internet (and in some large campus and enterprise networks) do not follow this “long flow” model, a new forwarding paradigm is required that would eliminate the increasing cache maintenance resulting from the growing numbers of short flows and dynamic network traffic changes.

The distributed forwarding architecture avoids the high overhead of continuous route/flow cache maintenance by using a full forwarding table for the forwarding decisions. The forwarding table contains the same forwarding information in the main routing table maintained by the system processor. The maintenance of the same critical information required for forwarding packets in the forwarding table and routing table eliminates the need for maintaining a route/flow cache for packet forwarding.

The distributed forwarding architecture best handles the network dynamics and changing traffic patterns resulting from the large numbers of short flows typically associated with interactive multimedia sessions, short transactions, and Web-based applications. Distributed forwarding offers high forwarding speeds, a high level of forwarding consistency, scalability, and stability in large dynamic networks.

Additionally, distributed forwarding in the line cards is less processor intensive than forwarding by the main system processor (CPU) because the forwarding decisions made by each line card are on a smaller subset of the overall destination addresses maintained in the system. Basically, in distributed forwarding, the system processor (CPU) offloads a majority of its processing to the line cards, resulting in efficient and higher overall system performance. This architecture allows for high-speed forwarding (wire speed on all ports) with low data transfer latency. One other key benefit of distributed forwarding is rapid routing convergence in a network.

Since the forwarding table is distributed to all line cards, any time a route is added, a route flaps, or a route goes away, the system processor updates its forwarding table with that information and also updates the forwarding tables in the line cards. This means that system processor interrupts are minimized, because there is no route/flow cache to invalidate and flow destinations to relearn for the cache. The line cards are able to receive the new routing updates quickly (via the system processor forwarding table) and the network reconverges quickly around a failed link base if that happens.

When designing a distributed forwarding system (e.g., the Catalyst 8500), it is highly beneficial to separate the control plane and data forwarding plane functions in the system. This design approach allows the system processor to handle the control plane processing while the line cards handle (without system processor intervention) the data forwarding. Other than interacting with the system processor to maintain their local forwarding tables, the line cards operate almost autonomously.

The system processor handles all routing and system-level management functions, such as running the unicast and multicast routing protocols and constructing and maintaining the routing table and distributed forwarding tables (used by the line cards), ARP, CDP, and STP configuration. Each distributed forwarding engine ASIC (in the Catalyst 8500) is responsible for identifying these special control and management packets and forwarding them to the system processor.

The system processor is responsible for running all of the Catalyst 8500's routing protocols that are implemented as part of the Cisco IOS software. Most importantly, the system processor is responsible for maintaining the routing and forwarding tables. Using these capabilities, the system processor creates a master forwarding table, which is a subset of the information in the routing table and contains the most important information required for routing packets.

The forwarding table reflects the true current topology of a network, thus allowing high-speed data forwarding to be done using the actual network topology information. The master forwarding table is then copied to the line cards, allowing them to make Layer 3 forwarding decisions locally without system processor intervention. This forwarding architecture allows the Catalyst 8500 to forward packets at wire speed at all ports.

The system processor is also responsible for multicast routing and forwarding where it maintains state information regarding multicast sessions and group memberships. The Catalyst 8500 supports the older Distance Vector Multicast Routing Protocol (DVMRP), as well as the PIM (sparse mode and dense mode). The system processor also responds to and forwards multicast joins, leaves, and prune messages. However, the line cards are responsible for multicast forwarding using the multicast forwarding tables created by the system processor.

In Layer 2 forwarding, the forwarding decisions are also made at the line cards using the Layer 2 forwarding tables created by the system processor (SRP in Catalyst 8510 or SP in Catalyst 8540). The SRP or SP is responsible for generating the Layer 2 forwarding information as well as running STP and handling bridge group configuration. The system processor is responsible for running all STP functions that include determining the spanning tree root bridge, optimum path to the root, and forwarding and blocking spanning tree links.

10.6.2 Forwarding Operations

The Catalyst 8500 distributed forwarding architecture uses two types of tables – a Layer 3 forwarding table (FIB) and an adjacency table (Figure 10.6) – that are maintained by the system processor (i.e., the SRP in the case of the Catalyst 8510, and RP and SP in the Catalyst 8540). These tables are downloaded to the line cards for local use. The Layer 3 forwarding table is a mirror of the main forwarding information in the routing table and is used for making Layer 3 forwarding decisions.

Figure depicts forwarding information base (FIB) and adjacency table.

Figure 10.6 Forwarding information base (FIB) and adjacency table.

The adjacency table maintains information about the nodes that are directly connected (adjacent) to the switch/router (physically or logically through a virtual connection). This table contains the Layer 2 information about the adjacencies such as the Layer 2 (MAC) addresses of their connected interfaces. The Layer 2 addresses are used during the packet rewrite process when packets are prepared to be sent to the next hop (which is an adjacent node). Each entry in the Layer 3 forwarding table (i.e., the next hop IP address that matches a destination address in a packet) includes a pointer to a corresponding adjacency table entry.

The main Layer 3 forwarding table maintained by the SRP or RP is constantly updated by inputs from the routing table which in turn is maintained by the routing protocols. After the SRP or RP resolves a route, the resolution translates to a next hop IP node, which is an adjacent node. The SRP or RP then copies this updated information to the line cards, allowing them to maintain an updated and current view of the network topology.

These routing updates enable correct forwarding decisions to be made, as well as fast network convergence when routing topology changes take place. As the network topology changes dynamically, the routing table is updated and the changes populated in the Layer 3 forwarding table. The SRP or RP modifies the Layer 3 forwarding table any time a network route is removed, added, or changed in the routing table. These updates are immediately copied to the forwarding tables in the line cards. This results in all line cards having a correct view of the network topology at all times.

Adjacency table entries are added when a routing protocol detects that there is a next hop (i.e., an adjacent node) leading to an IP destination address. The adjacency table is updated when a network route is removed, added, or changed in the routing table. The updates in the adjacency table can be from inputs from the routing protocols, which include adjacencies derived from next hop IP node information and multicast (S,G) interfaces carrying multicast traffic flows to multicast groups.

When a packet arrives at a Catalyst 8500 switch/router port, the distributed forwarding engine ASIC performs a lookup in its forwarding table using the packet's destination IP address. The longest matching forwarding table entry produces a next hop IP address that also points to a corresponding adjacency entry. This adjacency table entry provides the Layer 2 address of the receiving interface of the next hop, which can be used to rewrite the Layer 2 destination address of the outgoing Layer 2 frame at the outgoing interface. The Layer 2 frame carrying the outgoing packet is forwarded to the next hop using the discovered adjacent table Layer 2 information.

10.6.2.1 Layer 3 Forwarding in the Catalyst 8500

The packet forwarding process in the Catalyst 8500 can be described by the following steps:

  1. When a packet is received at a switch/router port, the packet is first handled by the MAC sublayer functions in the distributed forwarding engine ASIC. After this, the packet is stored in the forwarding engine ASIC's Internal Memory.
  2. As soon as the distributed forwarding engine ASIC receives the first 64 bytes of the arriving packet, the microcontroller's microcode reads the packet's source and destination IP addresses. If the packet's destination MAC address is the MAC address of switch/router's receiving interface, then packet requires Layer 3 forwarding. If not, it requires Layer 2 forwarding.
  3. The MAC and IP destination addresses of the packet are used by the forwarding engine ASIC's Search Engine to perform a lookup in the CAM for the best matching entry.
  4. After the best matching entry is discovered, the result is sent back to the microcontroller.
  5. The microcontroller then moves the packet from the forwarding engine ASIC's Internal Memory to the frame FIFO in the Fabric Interface. In parallel, the Search Engine sends all relevant QoS classifications, priority queuing, and MAC address rewrite information to the Control FIFO also in the Fabric Interface.
  6. The input Fabric Interface then carries out all the packet rewrite (IP TTL decrement, IP checksum computation, source and destination MAC address, Ethernet checksum computation) and QoS classifications.
  7. The input Fabric Interface prepends an internal routing tag to the packet. The internal routing tag identifies the QoS priority queuing for the packet, the destination port(s), and the packet's discard priority. The QoS priority indicated in the tag determines which of the four priority queues maintained at the destination port the packet will be queued.
  8. As soon as the packet is completely transferred to the Fabric Interface's Frame FIFO, the Frame scheduler is signaled (based on the packet's QoS priority level) to initiate arbitration for access to the shared memory. When access is granted, the packet is transferred into the shared memory. The packet is stored along with a pointer that indicates the destination port of the packet.
  9. The Fabric Interface signals the destination port to retrieve the packet out of a known location in the shared memory. The internal routing tag indicates to the destination port that it is receiving the correct packet.
  10. The destination port then forwards the packet to the external network.

In the Catalyst 8500, the multicast routing table is also centralized and maintained on the SRP (in Catalyst 8510) or RP (in Catalyst 8540). The forwarding engine ASIC in a line card consults the forwarding table (that includes multicast routes) to forward the multicast packets to appropriate destinations.

A multicast routing table is not the same as a unicast routing table. A multicast routing table maps a source IP address transmitting multicast traffic to a multicast group of receivers interested in the multicast traffic (the mapping commonly represented by (S,G)). The table consists of an input interface (port) on the switch/router and a set of corresponding output interfaces (ports).

The central multicast routing table maintained by the multicast routing protocols running in the SRP or RP is distilled into multicast forwarding information to be used in the line cards. By distributing forwarding engines and the associated distribution of the multicast forwarding information in the Catalyst 8500, the line cards are able to forward multicast traffic locally based on the multicast topology contained in distributed information.

The local multicast forwarding information allows an input port on a line card to determine which output interfaces on the switch/router require a particular multicast traffic. The input line card then signals the shared memory switch fabric about which output ports to forward that traffic to. Changes in the multicast routing table are instantly distilled and copied to the line cards, allowing them to maintain up-to-date multicast distribution map of the multicast traffic in a network.

10.6.2.2 Layer 2 Forwarding in the Catalyst 8500

As already stated, if the destination MAC address of an arriving frame is the switch/router's interface MAC address, then the packet is Layer 3 forwarded – if not, it is Layer 2 forwarded. When groups of ports or a port in the switch/router are configured to run in Layer 2 forwarding (bridging) mode, the forwarding engine ASIC's Search Engine performs a lookup in the CAM based on the Layer 2 MAC address of the arriving packet.

Given that the Catalyst 8500 has a distributed forwarding architecture, each distributed forwarding engine ASIC maintains a list of MAC addresses and their corresponding exit ports that are of local significance. For example, if Address 01:23:45:67:89:ab is a destination MAC address learned on switch/router port SREthernet 0/2, the remaining ports on the Catalyst 8500 do not have to store this MAC address in their Layer 2 forwarding tables (in the CAMs) unless they have a packet to forward to Address 01:23:45:67:89:ab (at port SREthernet 0/2).

The system processor (SRP or RP) has a central CAM (that maintains a master (integrated) forwarding table that holds both Layer 2 and Layer 3 addresses. When the distributed forwarding ASIC learns a new MAC address, that MAC address (not the packet that carries it) is forwarded to the system processor so that it has an updated list of all MAC addresses learned. The system processor populates its CAM with the new MAC address learned. The system processor's central CAM contains all MAC addresses manually configured in the CAM plus those that the switch/router has learned dynamically.

If an arriving Ethernet packet has destination MAC address that is ffff.ffff.ffff (i.e., the broadcast address), then the packet is prepended with an internal routing tag that indicates that its destination is all ports in the bridge group the port belongs to. The tagged packet is then sent to the shared memory switch fabric for storage. The Fabric Interface creates a pointer to the packet's memory location that is signaled to all the ports in that bridge group. This means that if six ports are configured in a bridge group, then all six ports would receive the broadcasted packet.

To describe the Layer 2 forwarding process in the Catalyst 8500, let us assume that the source and the destination MAC address of a packet have already been learned. The following steps describe the Layer 2 forwarding process:

  1. When a packet arrives at a switch/router port, the MAC-layer functions in the distributed forwarding engine ASIC process the packet, and the packet is placed in the forwarding engine ASIC's Internal Memory.
  2. As soon as the distributed forwarding engine ASIC receives the first 64 bytes of the packet, the microcontroller's microcode reads the packet's source and destination MAC addresses. If the packet's destination MAC address is not that of the receiving port of the switch/router, then Layer 2 forwarding is required. The MAC address information is then passed to the Search Engine for the Layer 2 address lookup in the CAM.
  3. Let us assume the packet has been transmitted from a station in a particular VLAN (i.e., a Layer 2 broadcast domain). The Search Engine performs a lookup in the CAM for the destination port entry that corresponds to the packet's destination MAC address.
  4. When the correct destination port is located, the microcontroller transfers the packet from the distributed forwarding engine ASIC's Internal Memory to the Frame FIFO in the Fabric Interface. At the same time, the Search Engine forwards the packet's QoS classification and priority queuing information to the Fabric Interface's Control FIFO. An internal routing tag that is prepended to the packet identifies the destination port and the QoS priority queuing for the packet.
  5. As soon as the packet is completely transferred into the Frame FIFO, the Frame Scheduler is signaled to start arbitrating for access to the shared memory. When the Frame Scheduler is granted access, it transfers the packet into the shared memory.
  6. The Fabric Interface then signals the destination port ASIC to read the packet from the shared memory. The internal routing tag indicates to the destination port that it is receiving the correct packet.
  7. The destination port then transmits the packet out to the external network.

A bridge group refers to a (Layer 2) broadcast domain configured within the switch/router. Typically, to simplify network design (although not necessarily), a bridge group is configured to correspond to a particular IP subnet. Up to 64 bridge groups can be supported in the Catalyst 8500.

It is important to note that a bridge group is different from a VLAN. A VLAN is a broadcast domain that terminates at router (Layer 3 forwarding device). Inter-VLAN communication can only take place through a router because this is where the Layer 3 forwarding from one VLAN to the other is expected to take place. An IEEE 802.1Q VLAN trunking standard can trunk multiple VLANs, each VLAN terminating at a router port.

If a VLAN needs to be extended through a switch/router, then the extension can be done by configuring a bridge group. In this case the switch/router is transparent to a particular VLAN traffic going through it. The VLAN on both sides of a bridge group are essentially the same one VLAN (same broadcast domain). On the other hand, two VLANs configured on a network and communicating via Layer 3 forwarding in the switch/router is not a bridge group since that traffic is not bridged through the switch/router.

10.7 Catalyst 8500 Quality-of-Service Mechanisms

QoS has become increasingly more important as networks carry delay sensitive end-user traffic such as streaming video, voice, and interactive applications that require low latencies. The Catalyst 8500 supports a number of QoS mechanisms that are incorporated into the switch/router architecture. The Fabric ASIC and Frame Schedule are the two main QoS components in the Catalyst 8500. They perform the packet classification and priority queuing at the input port, in addition to packet scheduling of the priority queues using a WRR algorithm at the output port.

The Catalyst 8500 series supports four priority queues per port in which packets can be queued based on, for example, their IP Precedence bit settings (Figure 10.7). The IP Precedence information can be extracted from the IP header type-of service (ToS) field (also called the Service Type field) or the corresponding DSCP bits in the IP header. In the IP packet, the first 3 bits of the ToS field are used to signal the delay and drop priority of the packet.

Figure depicts traffic scheduling and bandwidth allocation.

Figure 10.7 Traffic scheduling and bandwidth allocation.

The rightmost bit in the 3 bits used for the IP Precedence defines the drop priority of a packet. If this bit is set (to 1), the Catalyst 8500 drops that packet when the destination queue becomes full before it drops packets with bits not set. The leftmost 2 bits are used to define the delay priority of a packet. The eight priority classes resulting from the 3 bit IP Precedence translate to eight different classes in the Catalyst 8500. The eight traffic classes are mapped to the four QoS priority queues in the Catalyst 8500 as summarized in Table 10.1.

Table 10.1 IP Precedence Values

IP ToS Field Value Delay Priority Drop Priority Queue Selected
0 0 0 0 0 0 (Drop packet first) QoS-3 (lowest priority)
0 0 1 0 0 1 (Drop after last) QoS-3
0 1 0 0 1 0 QoS-2
0 1 1 0 1 1 QoS-2
1 0 0 1 0 0 QoS-1
1 0 1 1 1 1 QoS-1
1 1 0 1 1 0 QoS-0
1 1 1 1 1 1 QoS-0 (highest priority

All control and management traffic, such as routing protocol updates, STP BDPU information, and management packets (ARP, IGMP, ICMP, etc.), are placed in the highest-priority queue for transmission to the system processor (SRP in Catalyst 8510 or RP in Catalyst 8540). These special packets have to be forwarded to the system processor with minimum delay at the input port.

The Catalyst 8500 queue packets are based on two parameters: the delay priority setting in the IP Precedence bits in the packet and the target next hop interface. The delay priority setting (i.e., the leftmost 2 bits) of the IP Precedence bits signal to the Frame Scheduler which of the four priority queues the packets should be queued in. The Fabric Interface then supplies a pointer to the output (destination) port, indicating the priority queue and shared memory location from which to extract (read) the packet.

Recall that the five-slot modular Catalyst 8510 chassis can support up to 32 10/100 Mb/s Ethernet ports or 4 GbE ports. This means the Catalyst 8510 can support a maximum of 32 next hop interfaces. Each of these 32 possible next hop ports supports four priority queues. Packets processed by any distributed forwarding engine ASIC can be queued in 1 of 128 queues (equal to 32 ports × 4 queues) in the shared memory based on delay priority settings in the packet and next hop interface the packet is to be sent to. The Fabric Interface provides the output (destination) port with a pointer to the priority queue and shared memory location of the packet.

Each priority queue at the Catalyst 8500 port is configured with a higher queue threshold and a lower queue threshold limit. These queue limits are user configurable to meet targeted traffic management policies. These two queue limits can be viewed, respectively, as the in-of-profile queue threshold (for packets conforming to the configured traffic policy) and out-of-profile queue threshold.

A queue accepts all packets if its current queue length is below the configured lower queue threshold. If the queue length is between the lower and higher queue threshold limits, the queue accepts only packets with drop priority setting of 0 (or in-profile packet) and discards packets with drop priority setting of 1 (out-of-profile). When queue length is greater than the higher queue threshold, then all arriving packets are discarded until the congestion subsides.

The Frame Scheduler performs two main functions in the Catalyst 8500. It is responsible for scheduling packets that have been processed by the distributed forwarding engine ASIC into the shared memory switch fabric based on the priority queuing requirement of the packet. The Frame Scheduler also schedules packets out of the shared memory switch fabric at the output ports based on the WRR algorithm.

To write packets into the shared memory, the distributed forwarding engine ASIC sends a request to the Frame Scheduler requesting access to the shared memory. The Frame Scheduler receives requests from all active forwarding engine ASICs and processes them in a time-division multiplexing (TDM) fashion. With this mechanism, each forwarding engine ASIC is always given an equal opportunity to write a complete packet into the shared memory when it is granted access.

Recall also that each distributed forwarding engine ASIC on a line card is responsible for four ports. This means that the Frame Scheduler allows the distributed forwarding engine ASIC to write a maximum of four packets into the shared memory. Each packet written into the shared memory is assigned an internal routing tag, which (as mentioned earlier) contains the destination port, priority queuing, and packet discard priority of the packet. Using the internal routing tag, the input Frame Scheduler stored the packet in the shared memory in the correct priority queue (see Figure 10.7).

The “LL,” “LH,” “HL,” and “HH” designations in Figure 10.7 refer to the grouping of the IP Precedence bit settings used by the Catalyst 8500 to map IP Precedence-marked traffic to the appropriate priority queue. The Catalyst 8500 maintains (not shown Figure 10.7) a fifth high-priority internal routing tag that is prepended to all control and management packets. This fifth routing tag signals that such critical packets require immediate delivery to the system processor.

At the output side of the shared memory switch fabric, the Frame Scheduler is responsible for scheduling packets from each priority queue using the WRR algorithm. The WRR algorithm allows a network manager to configure scheduling weights that define how much bandwidth each queue should receive. Under conditions where there is no congestion, the WRR mechanism and the scheduling weights configured do not really affect how packets are transmitted out of the shared memory switch fabric, because there is an abundance of bandwidth out of the queues. However, if the output is congested due to excess input traffic, then the WRR mechanism schedules each queue at a port based on the priority setting defined by the weights.

The Catalyst 8500 also allows a network manager to override the global QoS settings (configured through the switch/router management interface in the system processor) by defining priority configuration on a per port basis. The network manager also has the option of configuring bandwidth and packet classification (and queuing) based on source–destination address pair, destination address, or source address basis. The manager can configure scheduling weights to allow certain IP addresses to have more bandwidth than others.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset