Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3
Shared-Bus and Shared-Memory-Based Switch/Router Architectures

3.1 Architectures with Bus-Based Switch Fabrics and Centralized Forwarding Engines

The first generation of router and switch/router designs has relied upon centralized processing and shared memory for forwarding and buffering packets, respectively. These designs have traditionally been based on a shared bus switch fabric. In such designs, all packets received from all interfaces are written to a common memory pool.

After forwarding decisions are made, packets are then read from this shared memory and sent to the appropriate output interface(s). Even a majority of today's low-end or small-capacity systems (residential and small enterprise routers and switch/routers) still adopt this design. The simplicity and requirements of these systems make the shared-bus, shared-processor, and shared-memory architecture a natural fit.

Examples of this category of switch/router architectures are listed below. These example architectures are still in common use today and very much contemporary even though some were developed more than a decade ago. The goal here is to use these designs to highlight the main architectural approaches adopted and concepts developed over the years.

Example Architectures

DECNIS 500/600 Multiprotocol Bridge/Router (Chapter 5)
Fore Systems PowerHub multilayer switches with Simple Network Interface Modules (Chapter 6)
Cisco Catalyst 6000 Series Switch Architectures (Chapter 7)
Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – Architectures with “Classic” line cards (Chapter 9)
Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – Architectures with CEF256 fabric-enabled line cards (optional Distributed Forwarding Card (DFC) not installed) (Chapter 9)

With the advent of high-speed, low-power electronics, multicore processor modules, and high-speed backplane technologies, these older designs have been improved to deliver even higher forwarding speeds. However, the high interface speeds and throughputs required for aggregation and core networks are driving the need for higher capacity distributed forwarding architectures that can replace the traditional system designs where line cards have to contend for a single, shared pool of memory and processing resources.

The first attempt at improving the bus-based architectures was to use multiple processor modules equipped with their own forwarding engines and resources – a pool of parallel forwarding engines. We give here a basic discussion of the major bus-based architectures that have evolved over the years.

3.1.1 Traditional Bus-Based Architecture with Software-Based Routing and Forwarding Engines in a Centralized Processor

In this architecture, a general-purpose CPU is responsible for both control plane and data plane operations (Figure 3.1). Here, an operating system process running on the general-purpose CPU is responsible for processing incoming packets and forwarding them to the correct outbound interface(s). This software-based forwarding approach is significantly slower than forwarding performed in a centralized hardware engine. The speed of the shared bus also plays a big role in the overall packet forwarding performance (throughput) of the device.

**Figure 3.1** Traditional bus-based architecture with software routing and forwarding engines in a centralized processor.

The CPU must examine the destination IP address of each packet and make forwarding decisions, rewrite MAC addresses, and forward the packet. The packet is first received and placed in a shared system memory. The forwarding process or engine (running in the centralized processor) consults the forwarding table (that may include adjacency information from the ARP cache) to determine the next hop router's IP address, outgoing interface, and the next hop's receiving interface MAC address.

The CPU then rewrites both the next hop's MAC address as the destination MAC address of the Ethernet frame carrying the packet and the MAC address of the outgoing interface as the source MAC address of the Ethernet frame, decrements the IP Time-to-Live (TTL) field, recomputes the IP header checksum and the Ethernet checksum, and finally transmits the packet out of the outgoing interface to the next hop.

This software-based forwarding architecture is slow and resource-intensive since the CPU has to process every packet in addition to performing control plane operations. For these reasons, better architectures and forwarding methods are clearly needed in order to reduce demands on the single CPU as well as increase system performance. Improvements in software-based forwarding could be achieved via the following methods (which are described in greater detail in the following sections):

Use multiple forwarding engines in separate processors, each dedicated to packet forwarding not control plane processing.
Add a flow/route cache to the centralized processor to enable simpler lookup operations than the more complex longest matching prefix lookup.
Use newer highly efficient and optimized lookup algorithms with their related forwarding data structures (forwarding tables) [AWEYA2001, AWEYA2000]

In the second approach, the faster packet forwarding (using the flow/route cache) begins only after the CPU performs the slower software-based longest matching prefix forwarding on the first packet of a flow. During the software-based forwarding process, a flow/route cache entry is created that contains all the necessary forwarding information discovered during the software-based process. The flow/route cache entry allows subsequent packets of the same flow to be processed and forwarded using the faster flow/route cache instead of via the software-based full forwarding table lookup.

3.1.2 Bus-Based Architecture with Routing and Forwarding Engines in Separate Processors

Since the software-based longest matching prefix lookup in a forwarding table is processing-intensive and can be time-consuming, some software-based routers or switch/routers dedicate a separate processor solely for packet forwarding (Figure 3.2). In this architecture, one processor serves as the route processor and another as the forwarding processor (where the forwarding engine is housed).

**Figure 3.2** Bus-based architecture with routing and forwarding engines in separate processors.

The route processor runs the network operating system, provides all the routing functions, and also monitors and manages the components of the system. Many additional features and functions can be supported in the route processor along with the pure routing functions:

Generating and distributing forwarding information to the forwarding processor, running control and management protocols, supporting management and configuration tools, and so on.
Provide out-of-band management functions and interfaces such as console, auxiliary, Ethernet, and USB ports for system configuration and management.

The forwarding processor communicates with the route processor via the shared bus or over a dedicated link or channel between them. The shared bus or dedicated link allows the route processor to transfer forwarding information to the forwarding table in the forwarding processor.

The forwarding processor can also use the dedicated link to transfer routing protocol messages and other control and management packets (ARP, ICMP, IGMP, etc.) from the external network destined for the route processor that have been received by the forwarding processor and cannot be forwarded. Packets that cannot be forwarded by the forwarding processor are considered exception packets and are sent to the route processor for further processing.

The forwarding processor provides the following functions:

Performs forwarding table lookups using the forwarding table it maintains.
Manages the shared memory and allocates buffers to incoming packets.
Transfers outgoing packets to the destination ports when the forwarding decisions are made and the packets are ready to be transmitted.
Transfers exception and control packets to the route processor for processing. Any errors originating in the forwarding processor and detected by it may be sent to the route processor using system log messages.

In this architecture, the performance of the device also depends heavily on the speed of the shared bus and the forwarding processor.

3.1.3 Bus-Based Architecture with Forwarding Using a Flow/Route Cache in Centralized Processor

In this architecture, the CPU maintains a flow/route cache that holds recently used forwarding table entries as a front-end table for forwarding packets. The entries in the cache are structured in a fast and efficient lookup format, which are consulted before the main (full) forwarding table (Figures 3.3 and 3.4). When a new packet arrives, this flow/route cache is consulted before the forwarding table maintained by the CPU.

**Figure 3.3** Bus-based architecture with forwarding using a flow/route cache in centralized processor.

Figure depicts route cache as a front-end lookup table. — **Figure 3.4** Route cache as a front-end lookup table.

The flow/route cache provides a simpler and faster front-end lookup mechanism that requires less processing than the main forwarding table that requires longest prefix matching lookups (in software). If the software forwarding process (engine) finds a matching entry for a destination during the flow/route cache lookup, it will forward the packet immediately to the destination port(s) and not bother consulting the forwarding table.

The relatively more extensive and complex forwarding table is only consulted when there is no entry in the flow/route cache for an arriving packet (first packet of a new flow). The flow/route cache is populated with forwarding information only when the first packet has been processed. An entry in the flow/cache specifies the key information required to forward subsequent packets associated with that first packet (i.e., egress port and MAC address rewrite information that maps to a particular destination address).

The entries in the flow/route cache are generated by forwarding the first packet in software, after which the relevant forwarding information for the forwarded first packet are used to create the required information for the cache entry. Subsequent packets associated with the flow are then forwarded using the faster flow/route cache. The flow/route cache entries for a flow may include information required for QoS processing and security filtering (priority queuing, packet discard profile, packet priority value remarking, etc.).

In this flow/route cache-based architecture, some types of packets will still require extensive software processing by the CPU:

Packets destined to the router or switch/router itself (e.g., management and control traffic, routing protocol messages, etc.)
Packets that are too complex for the flow/route cache to handle (e.g., IP packets requiring fragmentation, packets with IP options, packets requiring encryption, NAT (Network Address Translation) traffic, etc.)
Packets that require additional information that is not currently available or known (e.g., packets requiring destination MAC rewrites and requiring the CPU to send ARP requests)

3.1.4 Bus-Based Architecture with Forwarding Using an Optimized Lookup System in Centralized Processor

The routing table (constructed and maintained by the routing protocols) is not optimized for the packet-by-packet data plane operations. The entries in this table contain information such as the routing protocol that learned/discovered a route, metric associated with that route, and possibly the administrative distance of the route.

Although all this information is important to the overall routing process, not all is directly useable or relevant to data plane operations. This information is distilled instead to generate a smaller table, the forwarding table (or FIB), with contents more relevant for the data plane forwarding operations.

The forwarding table can be further optimized using specialized data structures to minimize data storage space while at the same time allowing faster lookups. Most often these optimized tables are tailored for specialized lookup algorithms. The forwarding process in the CPU can then use this smaller optimized lookup table and corresponding optimized lookup algorithms for packet forwarding (Figure 3.5).

**Figure 3.5** Bus-based architecture with forwarding using an optimized lookup system in centralized processor.

This optimized lookup table organizes the forwarding information in such a way that only the routing information required for data plane operations (e.g., destination prefix, next hop, egress interface) is prominent. The optimized table may also include a pointer to another optimized adjacency table, which describes the MAC address associated with the various next hop devices in the network.

New forwarding table lookup algorithms have been developed over the past decade in attempts to build even faster routers. The optimized lookup tables and lookup algorithms are created with the goal of achieving very high forwarding rates [AWEYA2001, AWEYA2000]. Some architectures use optimized data structures and algorithms on specialized lookup engines/processors or ASICs engines.

More often, each lookup algorithm (which performs longest matching prefix lookup) has its corresponding optimized data structure and lookup table. These longest matching prefix lookup algorithms and their corresponding forwarding tables are typically designed as a composite structure.

3.2 Architectures with Bus-Based Switch Fabrics and Distributed Forwarding Engines

Improvement in the performance of the shared-bus based architectures can be obtained by distributing the packet forwarding operations to other processors or to the line cards. Some architectures distribute forwarding engines and flow/route caches, in addition to receive and transmit buffers, to the line cards to reduce the load on the system bus and also improve overall system performance.

Other architectures distribute forwarding engines plus full forwarding tables (not flow/route caches) to the line cards to allow them to locally forward packets directly to other line cards without directly involving the route processor. Examples of the latter architectures are listed herein.

Example Architectures

Fore Systems PowerHub multilayer switches with Intelligent Network Interface Modules (INMs) (Chapter 6)
Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – Architectures with CEF256 fabric-enabled line cards with optional Distributed Forwarding Card (DFC) installed (Chapter 9)

3.2.1 Bus-Based Architecture with Multiple Parallel Forwarding Engines

To handle traffic from a large number of interfaces or high-speed ports, some architectures employ a number of parallel forwarding engines all dedicated to packet forwarding (Figure 3.6). With a pool of multiple forwarding engines, the system can implement some form of load sharing on these engines as the traffic load increases [AWEYA2001, AWEYA2000]. Another advantage of this architecture is that it is capable of handling efficiently input interfaces with different speeds and utilization levels.

Figure depicts bus-based architecture with multiple parallel forwarding engines. — **Figure 3.6** Bus-based architecture with multiple parallel forwarding engines.

Each forwarding engine can support specialized and optimized forwarding architectures to support not only high-speed lookups but also specialized hardware that can be used to provide QoS classification and security ACLs filtering. The ACLs can be processed at the same time when the next hop lookup is being performed.

The forwarding engines can be implemented in specialized ASICs, TCAMs, and NPUs (network processing units). ASIC implementations, in particular, allow these additional features to be turned on without affecting overall packet forwarding performance.

Using ASICs in the forwarding engines allow very high packet forwarding rates, but the trade-off is that ASICs are not very flexible and limited in their functionality because they are generally hardwired to perform specific operations and tasks. Some routers and switch/router architectures employ, instead, NPUs that are designed to overcome the inflexibility of using ASICs in the forwarding engine.

Unlike ASICs, NPUs are programmable, flexible enough to allow relatively more complex operations and features to be implemented in a forwarding engine. Also, when bug fixes and feature modifications and future feature upgrades are required, their software and firmware can be changed with relative ease.

The forwarding engines are responsible for the forwarding table lookups and packet forwarding functions in the system, while the centralized route processor provides the following functions:

Run the routing protocols, construct and maintain the routing table, and perform all protocol message exchanges and communications with peers in the network.
Generate the forwarding table from the routing table and distribute to the forwarding engines.
Synchronize the contents of the multiple forwarding tables to the master forwarding table maintained by the route processor.
Receive and process exception and control packets sent from the forwarding engines for processing.
Load the operating system software images and other operating features (QoS and security ACL policies, priority queuing, packet discard policies, etc.) to all forwarding engines upon system power up or through operator commands.
Provide out-of-band configuration and management of the overall system using console, auxiliary, Ethernet, and USB ports.
Monitor and regulate the temperature of system components such as the forwarding engine modules, router processor modules, line cards, power supplies, and so on. Temperature sensors and cooling fans are used to regulate the temperature in the system.
In a redundant configuration with a primary and secondary route processor, the system will synchronize the state of the secondary route processor to that of the primary and perform high-availability failover when the primary route processor fails.
Serve as the central point in the system for configuring stateful firewall policies and distribution to the forwarding engines.
Serve as the central point in the system where IP security authentication, encryption methods, and encryption keys (Internet Key Exchange (IKE)) are negotiated and maintained.

The route processor may also be the central point in the system where a wide range of IP network services, such as MPLS, Layer 2 virtual private network (L2VPN), and Layer 3 virtual private network (L3VPN) are provisioned and then configured in the forwarding engines.

The route processor in the other router and switch/router architectures is also capable of supporting the above functions. This list in fact enumerates all the potential capabilities of the route processor in a typical router or switch/router.

The forwarding engines may communicate with the route processor using a dedicated communication link. The dedicated link may be used to transfer forwarding information from the route processor to the forwarding table in the forwarding engines. Alternatively, the communication between the route processor and forwarding engines can be done through the existing switch fabric in the system.

3.2.2 Bus-Based Architecture with Forwarding Engine and Flow Cache in Line Cards

In this architecture (Figure 3.7), each line card is equipped with a flow/route cache with a relatively simple forwarding engine that allows for fast lookups in the cache. This simple forwarding engine is not as sophisticated as the full-blown forwarding engine that supports the more complex longest matching prefix lookup algorithm. As already described, the flow/route cache serves as a front-end lookup mechanism that is consulted first before the main forwarding engine in a centralized processor.

Figure depicts bus-based architecture with forwarding engine and flow cache in line cards. — **Figure 3.7** Bus-based architecture with forwarding engine and flow cache in line cards.

Let us assume that the first packet sent in a flow is received by a port on a line card in the switch/router. The line card examines the destination IP address in the packet and performs a lookup in the flow/route cache for an entry that is associated with this packet's destination. The line card discovers that there is no entry for this packet because this is the first packet of a flow, so the packet is forwarded to the (forwarding engine in the) route processor for (full) forwarding table lookup and forwarding.

The line card being not able to fully process the packet may write an incomplete (or partial) flow entry in the flow/route cache at this stage of the forwarding process. This incomplete entry may include only partial information such as the source and destination IP addresses that identifies the flow the packet belongs to. By writing a partial entry at this stage, the line card will have less information to write later so that it can have spare processing cycles to handle new packets arriving.

The forwarding engine in the route processor receives the IP packet, reads the destination IP address, and performs a (longest prefix matching) lookup in its local forwarding table to determine the next hop information (next hop address, egress port, and next hop's MAC address). Typically, the lookup in the route processor is software based and only performed for the first packet of a flow.

The route processor then checks its local ARP cache (i.e., adjacency table) to determine the MAC address of the receiving interface of the next hop. If the ARP cache does not contain an entry for the next hop, the route processor transmits an ARP request (out the egress port of the next hop) for the MAC address associated with the next hop.

After obtaining the next hop's MAC address, the route processor rewrites the destination MAC address in the outgoing frame to be that of the next hop's MAC address, and the packet is forwarded out the egress port to the next hop. The source MAC address in the outgoing frame is the MAC of the outgoing interface.

After processing the first packet, the route processor sends the forwarding information associated with that packet to the line card so that it can create a complete flow/route cache entry for it. Subsequent packets in the same flow arriving at the line card do not have to be forwarded to the route processor again. The packet lookups and rewrites for these packets are performed instead in the line card using the flow/cache entry created for the first packet.

Similarly, in this architecture, some type of packets will still require handling by the route processor. These special packets have already been described.

3.2.3 Bus-Based Architecture with Fully Distributed Forwarding Engines in Line Cards

As discussed earlier, packet forwarding in the distributed forwarding architectures is done locally by distributed forwarding engines in the line cards. A master forwarding table maintained by the route processor is downloaded to the forwarding engine (ASICs or NPUs) in the line cards so that they can perform packet forwarding locally. By allowing for the forwarding to be done at the line card level (Figure 3.8), the overall forwarding throughput of the router or switch/router can be significantly increased.

Figure depicts bus-based architecture with fully distributed forwarding engines in line cards. — **Figure 3.8** Bus-based architecture with fully distributed forwarding engines in line cards.

Each line card is given its own memory for packet buffering and also for priority queuing of packets. A receive memory stores packets received from an interface, and a transmit memory stores packets ready for transmission out the output interface. Each line card also has a copy of the route processor's forwarding table and other QoS classification and security ACLs and policies so that it can forward packets without direct route processor intervention.

Some type of packets will still require handling by the route processor:

The line card may not produce a valid path for an arriving packet that could result in the packet requiring further processing elsewhere. If the forwarding table lookup fails to find a valid entry, the packet (in many cases) is forwarded (punted) to the route processor for further processing.
A particular feature for Layer 2, Layer 3, or QoS and security processing (MPLS, L2VPN, and L3VPN, etc.) may not be supported at the line card, thereby requiring route processor intervention.

Also, when the line card has an incomplete adjacency information for a next hop (next hop's MAC address), the line card may forward these packets to the route processor in order to start the address resolution process (ARP), which results in the adjacency information being completed some time later.

3.3 Architectures with Shared-Memory-Based Switch Fabrics and Distributed Forwarding Engines

One approach widely used to overcome the performance limitations of bus-based architectures is to use architectures that employ high-speed shared memory switch fabrics and distributed forwarding engines as illustrated in Figure 3.9. A key advantage of this architecture is that, in addition to serving as a switch fabric, the shared-memory can be used to temporarily store packets in large buffer memories to absorb the traffic bursts and temporary congestion that frequently occur in networks.

Figure depicts architectures with shared memory-based switch fabrics and distributed forwarding engines. — **Figure 3.9** Architectures with shared memory-based switch fabrics and distributed forwarding engines.

The memory also allows the system to queue packets using a priority queuing mechanism while they wait for transmission out the egress ports. The system can implement a number of weighted scheduling schemes (weighted round-robin (WRR), deficit round-robin (DRR), etc.) to service the priority queues. Packet discard policies such as tail drop, weighted random early detection (WRED) can also be implemented on the priority queues. The shared memory system may also support dynamic buffer allocation where buffers are allocated dynamically to the ports as traffic loads on them vary.

The shared memory switch fabric can be implemented as a single physically centralized shared memory or logically centralized shared memory (which consists of separate shared-memory modules that can be pooled together). The advantages and disadvantages of these two shared memory switch fabric designs are discussed in [CISCEVOL2001].

Example Architectures

Cisco Catalyst 3550 Series Switches (Chapter 8)
Cisco Catalyst 8500 campus switch routers (Chapter 10)

This distributed forwarding architecture still partitions the Layer 3 forwarding functions (which can be done in different ways, depending on the specific design), and places them in the line cards that connect to the shared memory switch fabric. The route processor still provides common services such as route processing, management and configuration, power, and temperature control to all the modules in the system.

3.4 Relating Architectures to Multilayer Switch Types

The different router architectures discussed above certainly have different characteristics depending on the application of the device in the overall network system (Figure 3.10). The most important characteristics that are most commonly associated with an architecture type are device size, form factor, performance, reliability, and scalability. Small and compact devices tend to adopt shared bus and shared memory switch fabrics, while the bigger devices are much flexible and practical to design using crossbar-based switch fabrics.

Figure depicts relating architectures to switch types. — **Figure 3.10** Relating architectures to switch types.

The smaller devices that are more suitable at the network access and residential networks (with a small number of user ports and lower forwarding capacities) tend to come in fixed configuration platforms. The larger devices that are usually employed at the network aggregation and core layers (and have higher forwarding capacities) tend to be based on crossbar switch fabrics and come in the form of modular chassis and multichasis platforms. The crossbar-based architecture can be designed to have advanced scalability and reliability features.

Table 3.1 presents a summary of the different types of switch/routers that are discussed in this book. The next chapters will be devoted to discussing each design in greater detail.

Table 3.1 Categories of Architectures Discussed in this Book

	Bus-Based Architectures	Shared-Memory-Based Architectures
Architectures with centralized forwarding engines	DECNIS 500/600 Multiprotocol Bridge/Router Fore Systems PowerHub multilayer switches without Intelligent Network Interface Modules (INMs) Cisco Catalyst 6000 Series Switch Architectures Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – architectures with “classic” line cards Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – architectures with CEF256 fabric-enabled line cards (optional DFC not installed)
Architectures with Distributed Forwarding Engines	Fore Systems PowerHub multilayer switches with Intelligent Network Interface Modules (INMs) Cisco Catalyst 6500 Series switches with Supervisor Engine 32 – Architectures with CEF256 fabric-enabled line cards with optional DFC installed	Cisco Catalyst 3550 Series Switches Cisco Catalyst 8500 campus switch routers

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Shared-Bus and Shared-Memory-Based Switch/Router Architectures

Create new playlist

Sign In

Sign Up

3.1 Architectures with Bus-Based Switch Fabrics and Centralized Forwarding Engines

Example Architectures

3.1.1 Traditional Bus-Based Architecture with Software-Based Routing and Forwarding Engines in a Centralized Processor

3.1.2 Bus-Based Architecture with Routing and Forwarding Engines in Separate Processors

3.1.3 Bus-Based Architecture with Forwarding Using a Flow/Route Cache in Centralized Processor

3.1.4 Bus-Based Architecture with Forwarding Using an Optimized Lookup System in Centralized Processor

3.2 Architectures with Bus-Based Switch Fabrics and Distributed Forwarding Engines

Example Architectures

3.2.1 Bus-Based Architecture with Multiple Parallel Forwarding Engines

3.2.2 Bus-Based Architecture with Forwarding Engine and Flow Cache in Line Cards

3.2.3 Bus-Based Architecture with Fully Distributed Forwarding Engines in Line Cards

3.3 Architectures with Shared-Memory-Based Switch Fabrics and Distributed Forwarding Engines

Example Architectures

3.4 Relating Architectures to Multilayer Switch Types

Table of Contents for
Chapter 3: Shared-Bus and Shared-Memory-Based Switch/Router Architectures