13
Case Study: Quality of Service Processing in the Cisco Catalyst 6000 and 6500 Series Switches

13.1 Introduction

The key QoS functions in the Catalyst 6000/6500 family of switch/routers requiring real-time processing are implemented in hardware. These QoS-related hardware components are implemented on Catalyst 6000/6500 modules such as the Policy Feature Card (PFC) and the port ASICs on the line cards. This chapter describes the QoS capabilities of the PFC and the switch/router port ASICs on the line cards [CISCUQSC06,CISCUQSC09].

The Multilayer Switch Feature Card (MSFC) supports some QoS functions such as control plane policing and other rate-limiting functions for control and management traffic. These special QoS functions are not discussed in this chapter. Detailed descriptions of the PFC and MSFC are given in Chapters 7 and 9.

13.2 Policy Feature Card (PFC)

The PFC1 is a daughter card that is supported only on the Supervisor Engine 1A of the Catalyst 6000/6500 family. The PFC2 is an improved design of the PFC1 and is supported on Supervisor Engine 2. PFC1 and PFC2 are the primary modules on which the hardware-based QoS functions are implemented for Supervisor Engine 1 and 2, respectively. Supervisor Engine 32 supports the PFC3B and MSFC2a as a default configuration. Supervisor Engines 720, 720-3B, and 720-3BXL all support PFC3 and MSFC3. The newer generations of the PFC support more advanced QoS functions than the older ones.

13.2.1 Policing in the PFC

In addition to some select line card types, the PFC supports virtually all the hardware-based QoS functions, including classification and policing of packets. A PFC can use packet classification together with an access control list (ACL) to mark incoming packets with a priority value. The marking can be based on IEEE 802.1p/Q, IP Precedence, Differentiated Services Code Point (DSCP), or Multiprotocol Label Switching (MPLS) Experimental (EXP) bits. The marked packets then allow other QoS functions to be implemented in the switch/router such as priority queuing and packet discard. We use IEEE 802.1p/Q to designate the priority value in the tagged Ethernet frame in this chapter.

Policing allows a stream of packets arriving at a particular point in the device to be rate limited to a predefined limit. The PFC supports the ability to rate limit (or police) in hardware, incoming traffic to the switch/router to a configurable predefined limit. Packets in excess of the rate limit can be dropped by the PFC or have their priority value marked down to a lower value. A PFC can support ingress and/or egress rate limiting in the switch/router.

In addition to the basic QoS functions of classification, priority queuing, and priority-based packet dropping, the PFC can support additional functions such as the following:

  • The PFC can support normal policing where packets are dropped or their priority values marked down if a configured policing policy returns an out-of-profile decision for arriving traffic. The PFC can also support an excess rate policy, where a second policing level can return a policy action on the policed excess traffic.
  • The PFC has the ability to push down a QoS policy to a Distributed Forwarding Card (DFC) for it to carry out QoS operations locally. The functions of the DFC are described in Chapter 9.

When an excess rate policer (i.e., a two-rate policer) is defined and applied at an interface, arriving packets can be dropped or their priority values marked down when the excess traffic rate exceeds the predefined excess rate limit. If an excess rate policing level is configured, the PFC uses an internally configured “excess DSCP” mapping table to determine what DSCP value the original DSCP value in a packet should be marked-down to.

If only a normal (single) policing level is configured, a “normal DSCP” mapping table is used. When both excess rate and normal policing levels are configured, the excess rate policing level will have precedence in the PFC for selecting the mapping rules used for priority value mark-downs in arriving packets. A PFC can be configured to support the two types of policing levels at an interface.

The PFC also supports policing at the aggregate flow and microflow levels. These policing functions are described in detail in Chapter 12:

  • Microflow Policing: In microflow policing, a flow is defined by a unique source and destination IP address, transport protocol type (TCP or UDP), and source and destination port number. For each flow that passes through a port of the switch/router, the microflow policer can be used to limit the rate of traffic received for that flow at the port. Flows that exceed the prescribed rate limit can have their packets dropped or their priority values marked down.
  • Aggregate Policing: Aggregate policing can be used to rate limit traffic on a port or VLAN that matches a specified QoS ACL. The aggregate flow can be viewed as the cumulative or aggregate traffic at the port or group of ports in a single VLAN that matches a specific Access Control Entry (ACE) of a QoS ACL.

Aggregate and microflow policing provide different ways of specifying the rate of traffic that can be accepted into the switch/router. Both an aggregate and a microflow policy can be configured and made active at the same time at a port or a VLAN. The PFC3 in Supervisor Engine 720 supports up to 63 microflow policers and 1023 aggregate policers.

Policing can be implemented with a token bucket algorithm where the network administrator defines a rate limit and maximum burst size for arriving traffic. The rate limit (or Committed Information Rate (CIR)) is defined as the average rate at which data can be transmitted.

The maximum bust size (or Committed Burst Size (CBS)) is the maximum amount of data (in bits or bytes) that can be transmitted back-to-back to not be out of conformant with the average rate limit. Packets that overflow the token bucket are either dropped or have their priority values marked down.

13.2.2 Access Control Entries and QoS ACLs

A QoS ACL contains of a list of entries (also referred to as ACEs) that define a set of QoS processing rules that can be used by a network device to process incoming packets. An ACE may define traffic classification, priority queuing, discard, marking/remarking, and policing criteria for incoming packets. If the attributes of an incoming packet match the criteria set in the ACE, the receiving device will process the packet according to the “actions” specified by the ACE.

For example, the PFC2 supports up to 500 ACLs, and these ACLs combined can maintain up to 32,000 ACEs in total. The actual number of ACEs the PFC2 can support will depend on the specific QoS functions and services defined on the switch/router and the available memory for ACL storage in the PFC2.

These constraints also apply to other PFC models for the Catalyst 6000/6500 family. The process of creating a policing rule in the PFC or line card involves creating a policer (aggregate or microflow) and then mapping that policer to an ACE in a QoS ACL.

13.3 Distributed Forwarding Card (DFC)

The DFC allows fabric-enabled and fabric-only (i.e., crossbar switch connected) line cards (see Chapter 9) to perform packet forwarding and QoS processing locally without direct MSFC intervention. In order to maintain consistent operations in the switch/router, the DFC must also support any QoS policies plus ACLs that have been defined in the PFC for the entire switch/router.

In the Catalyst 6000/6500, the QoS policies and related functions cannot be directly configured in the DFC. Instead, the DFC can be programmed with the desired features via the management interfaces in the MSFC and PFC. In a switch/router configuration with a redundant Supervisor Engine, the DFC can be programmed through the MSFC/PFC on the active Supervisor Engine.

The primary PFC (in a redundant configuration) pushes a copy of its Forwarding Information Base (FIB) to the DFC to allow it to maintain a local copy of the Layer 2 and Layer 3 forwarding tables used for packet forwarding. The PFC will also push a copy of the QoS policies to the DFC so that it can perform QoS operations locally within the line card. With this, the DFC can make local forwarding decisions and also reference the local copy of any QoS policies it has received.

These capabilities allow the DFC to perform QoS processing locally in hardware at wire speeds, thus providing higher overall system forwarding performance. The distributed forwarding and QoS processing functions provided by the DFC allow the switch/router to offload the centralized PFC packet forwarding and QoS operations to the line cards, thereby boosting overall system performance.

13.4 Port-Based ASICs

Each line card in the Catalyst 6000/6500 family supports a number of ASICs that implement the queues (along with their associated thresholds) used for temporarily storing packets as they transit the switch/router. The line cards also implement different numbers of queues that are used to prioritize packets at each port.

Each port in turn has a number of input (ingress) and output (egress) queues that are used to temporarily hold packets as they are being processed. The port queues are implemented in the line card hardware ASICs, which hold the necessary memory components. The memory pool is further split up and allocated to the ports on the line card.

Each line card port, typically, has four (congestion management) thresholds configured at each input queue. Each output queue, typically, is assigned two thresholds. These thresholds are also used, during QoS processing, to determine which arriving packet (with a predefined priority value) should be dropped when a particular threshold is crossed.

The network administrator can map different packet priority values to different queue thresholds, signaling to the switch/router which packets to drop when a particular threshold is exceeded. As the queue size builds up and a threshold is crossed, packets arriving with priority values mapped to this threshold will be dropped.

On the Catalyst 6000/6500 family, Weighted Random Early Detection (WRED) packet discarding and weighted round-robin (WRR) scheduling can be based on the priority tag carried in an arriving packet. The tagging enables the switch/router to provide enhanced congestion management with differentiated packet discard and outbound packet scheduling.

A port can employ WRR to schedule outbound traffic from its transmit (Tx) queues to the external network. WRED (and its variants) is a congestion management algorithm employed by the Catalyst 6000/6500 to minimize the impact of dropping packets (while recognizing their priority markings) during times of temporary congestion.

WRED is derived from the RED algorithm and takes into account the priority markings of arriving packets. RED monitors a queue as its occupancy starts to build up and once a predefined threshold has been crossed, packets are dropped randomly based on a computed queue size-dependent probability.

RED is designed to give no preferential treatment to specific flows – All packets arriving to a queue are subject to be dropped randomly. Dropped packets could carry high- or low-priority markings. Also, packets dropped by RED can belong to a single flow or multiple TCP flows. If the packet drops impact multiple flows, then this can have a significant impact on the TCP sources of the flows affected, which in turn reduce their window sizes in response.

Unlike RED, WRED takes into consideration the priority markings of the arriving packets (which could be based on the DSCP, IP Precedence, or IEEE 802.1p/Q priority setting mechanism). In WRED, the network administrator maps drop priority values to specific queue thresholds. Once a particular threshold is exceeded, packets with priority values that are mapped to this threshold are eligible to be dropped.

Packets with priority values mapped to the higher thresholds are accepted into the queue. This prioritized drop and admission process allows higher priority flows to have higher throughput (due to their larger window sizes) and lower latency when the source sends packets to the receiver.

When WRED is not configured on a port, the port uses the tail-drop method of congestion management. Tail-drop simply drops incoming packets once the queue completely fills up.

13.4.1 Original 10/100 Mb/s Ethernet Line Cards (WS−X6348−RJ45)

The 10/100 Mb/s Ethernet line cards employ a combination of different ASICs to create the modules on which the 48 10/100 Mb/s Ethernet ports reside. Each 10/100 Mb/s Ethernet line card (root) ASIC provides a breakout facility for 12 10/100 Mb/s Ethernet ports. A line card has four breakout ASICs, each handling 12 10/100 Mb/s Ethernet ports.

The 10/100 Mb/s Ethernet breakout ASIC in turn supports a number of Receive (Rx) and Transmit (Tx) queues for each 10/100 Mb/s Ethernet port. Each breakout ASIC has 128 kB buffering per 10/100 Mb/s Ethernet port. Each port then supports one Rx (ingress side) queue and two Tx (egress side) queues (one designated a high priority and the other as low).

The 128 kB of buffers (per port) is divided between the single Rx queue and the two Tx queues. The single Rx (ingress side) queue is allocated 16 kB of the available memory, and the 112 kB remaining memory is divided between the two Tx queues.

The input queues are used to hold packets, while the port arbitrates for access to the switch fabric. The output queues at a port are used to temporary store packets during high traffic loads to that particular port. The output queues are relatively larger than the input queues because of the potential of many ports sending packets to a single port at any given time.

13.4.2 Newer Fabric 10/100 Mb/s Ethernet Line Cards (WS−X6548−RJ45)

The newer 10/100 Mb/s Ethernet line card ASICs support a number of Rx and Tx queues for each 10/100 Mb/s Ethernet port. The line card ASICs support a memory pool that is shared by the 10/100 Mb/s Ethernet ports. Each 10/100 Mb/s Ethernet port on the line card in turn supports two Rx queues and three Tx queues. One Rx (ingress side) queue and one Tx (egress side) queue are designated as strict or absolute priority queues.

Packets in these two absolute priority queues are serviced in a strict priority fashion. If a packet arrives in a strict priority queue, the scheduler stops transmitting packets from the lower priority queues to service the packets in the strict priority queue. Only when the strict priority queue is empty will the scheduler recommence servicing packets from the lower priority queue(s). The strict priority queue is normally used as a low-latency queue to handle latency-sensitive traffic such as streaming video and voice traffic.

When a packet arrives at an input or output port during times of congestion, it is placed in one of a number of priority queues. The priority queue in which the packet is placed, typically, is based on the priority value (DSCP, IP Precedence, or IEEE 802.1p/Q) carried in the incoming packet.

At the egress port, a scheduling algorithm is used to service the Tx (output) queues. The scheduling can be done using the WRR algorithm. Each queue is assigned a weight that is used to determine the amount of data to be transmitted from the queue before the scheduler moves to the next queue. The weight assigned to a queue can be configured by the network administrator and is an integer number from 1 to 255.

13.4.3 Gigabit Ethernet Line Cards (WS–X6408A, WS–X6516, WS–X6816)

Each Gigabit Ethernet (GbE) line card has ASICs that support 512 kB of buffering per port. Similar to the 10/100 10 Mb/s Ethernet ports, each Gigabit Ethernet port has one Rx queue and two Tx queues. This queue setup is the default configuration on the WS−X6408−GBIC Gigabit Ethernet line card.

The newer 16-port Gigabit Ethernet line cards, the GBIC ports on the Supervisor Engine 1A and Supervisor Engine 2, and the WS−X6408A−GBIC 8-port Gigabit Ethernet line card support two extra strict priority queues (in addition to the three queues mentioned above). One strict priority queue is assigned as a Rx (ingress side) queue and the other as a Tx (egress side) queue.

These strict priority queues are used primarily for temporarily storing latency-sensitive traffic such as streaming video and voice. With the strict priority queue, packets placed in this queue will be serviced before packets in the high- and low-priority queues. Only when the strict priority queue is empty will the scheduler move to service the high- and low-priority queues.

13.4.4 10 Gigabit Ethernet Line Cards (WS–X6502–10GE)

Each 10 Gigabit Ethernet line card supports only one 10 Gigabit Ethernet port. The 10 Gigabit line card occupies one slot in the Catalyst 6500 chassis. The 10 Gigabit Ethernet line card also supports a number of QoS features. Each 10 Gigabit Ethernet port has two Rx queues and three Tx queues. One Rx queue and one Tx queue are designated as strict priority queues. The Gigabit Ethernet port has a total of 256 kB of Rx buffering and 64 MB of Tx buffering.

13.5 QoS Mappings

The priority value in a packet can be used to determine to which priority queue the packet be placed and at what queue threshold the packet is eligible to be dropped. This is one example of how the Catalyst 6000/6500 uses QoS mappings. When QoS is configured on the switch/router, the following default mappings are enabled:

  • The queue thresholds at which packets with specific priority values are eligible to be dropped.
  • The priority queue in which a packet is placed based on its priority value.

While the switch/router can use the default mappings, these mappings can be overridden by new settings. Additional mapping can be set such as the following:

  • Map IEEE 802.1p/Q priority value in an incoming packet to a DSCP value.
  • Map IP Precedence value in an incoming packet to a DSCP value.
  • Map DSCP value to an IEEE 802.1p/Q value for an outgoing packet.
  • Map IEEE 802.1p/Q priority values to drop thresholds on receive queues.
  • Map IEEE 802.1p/Q priority values to drop thresholds on transmit queues.
  • Markdown DSCP values for packets that exceed traffic policing limits.
  • Set IEEE 802.1p/Q priority value in a packet with a specific destination MAC address.

13.6 QoS Flow in the Catalyst 6000 and 6500 Family

To facilitate the application of QoS services to traffic passing through the switch/router, mechanisms to tag or prioritize IP packets or Ethernet frames are required. The IP Precedence and the IEEE 802.1p/Q Class of Service (CoS) bits are two example mechanisms that can be used to perform the tagging.

13.6.1 IP Precedence and IEEE 802.1p/Q CoS Tagging

The IP type of service (ToS) field (now obsolete) consists of 8 bits in the IP packet header. Of these, the first three leftmost bits are used to indicate the priority of an IP packet. These three bits are referred to as the IP Precedence bits. These bits can be set in a packet to take values from 0 to 7, with 0 representing the lowest priority and 7 the highest priority.

The CoS priority in an Ethernet frame can be based on 3 bits in either a Cisco Inter-Switch Link (ISL) header (used to carry VLAN information in Ethernet frames) or an IEEE 802.1Q tag. The latter is the standards-based method used to indicate the priority of an Ethernet frame. The three CoS priority bits in the IEEE 802.1Q tag that can be set in an Ethernet frame are commonly referred to as the IEEE 802.1p bits. The IEEE 802.1Q tag is 4 bytes (32 bits) long.

Of the four bytes, the leftmost 2 bytes are used for the Tag Protocol Identifier (TPID). The remaining 2 bytes are used for the Tag Control Information (TCI). The 16 bit TPID is set to the value of 0 × 8100 (in hexadecimal) to identify an Ethernet frame as tagged, that is, carrying an IEEE 802.1Q field (tag). The location of the TPID field is at the same position as the type/length field in untagged Ethernet frames. This allows IEEE 802.1Q tagged frames to be quickly distinguished from untagged frames.

The 2 byte TCI field is further divided into a 3 bit Priority Code Point (PCP), 1 bit Drop Eligible Indicator (DEI), and 12 bit VLAN identifier (VID). The 3 bit PCP field is referred to as the IEEE 802.1p CoS and is used to indicate the Ethernet frame's priority level. The DEI bit can be used to indicate an Ethernet frame as eligible to be dropped when a network device decides that frames must be dropped.

Fortunately, the three IEEE 802.1p/Q bits match the number of bits used for IP Precedence. In many networks, there is the need to maintain QoS end-to-end, thereby requiring packets from a user to traverse both Layer 2 and Layer 3 networking domains to a receiver. To maintain QoS end-to-end, the IEEE 802.1p/Q priority values in packets can be mapped to IP Precedence priority values, and IP Precedence priority values mapped to IEEE 802.1p/Q priority values.

Cisco IOS and network devices have supported the setting of IP Precedence for many years. The MSFC or the PFC (independent of the MSFC) in the Catalyst 6000/6500 supports the setting and resetting of the IP Precedence bits in packets. The network administrator can configure a trust setting of “untrusted” at a port that can also wipe out any IP Precedence settings on incoming packets.

More recently, the use of the IP Precedence bits has been included in the capabilities defined by the 6 bits designated as the DiffServ Code Point (DSCP) [RFC2474,RFC2475]. The 6 bit DSCP field results in 64 priority values (=26) that can be assigned to an IP packet.

The Catalyst 6500 supports the DiffServ's Per Hop Behaviors (PHB) as specified in the IETF standards. The PHB supported include the Default PHB, Class Selector PHBs, Assured Forwarding PHBs, and the Expedited Forwarding PHB. The DSCPs for the PHBs are as follows:

  • Assured Forwarding PHB consists of 12 DSCPs.
  • Expedited Forwarding PHB DSCP: 101110.
  • Default PHB DSCP: 000000.

In addition to defining the Diffserv field in the IPv4 and IPv6 headers, [RFC2474] also defines the Class Selector code points. These code points are the first 3 bits (i.e., the leftmost) of the DSCP that also correspond to the IP Precedence field in the old IP ToS field. These code points (xxx000) correspond to the first 3 bits and with the three rightmost bits all set to 0. The 3 bits result in 8 Class Selector PHBs.

The Catalyst 6000/6500 switch/routers can alter the IP Precedence bit settings in arriving packets. The remarking can be performed using either the PFC or MSFC. When a packet arrives at the switch/router, it is assigned an internal DSCP value. This internal DSCP value is used only within the switch/router to assign different levels of QoS (QoS policies) to arriving packets.

The internal DSCP value may already exist in an arriving packet and be used directly in the switch/router, or the internal DSCP value can be derived from the IEEE 802.1p/Q, IP Precedence, or DSCP value carried in the arriving packet (if the port of arrival is trusted).

The switch/router uses an internal map to derive the internal DSCP. Given that there are eight possible IEEE 802.1p/Q (denoted IEEE here) and eight IP Precedence (denoted IPPrec) values and 64 possible DSCP values, the default map used by the switch/router maps IEEE/IPPrec 0 to DSCP 0, IEEE/IPPrec 1 to DSCP 7, IEEE/IPPrec 2 to DSCP 15, and so on. These default mappings (used by the switch/router) can be overridden by other mappings configured by the network administrator.

When an arriving packet is processed and transferred to an outbound port, the IEEE 802.1p/Q priority value in the outbound packet (to the external network) can be rewritten using a mapping table that translates the internal DSCP value to the new IEEE 802.1p/Q priority value in the packet. The packet exits the switch/router with this new DSCP value on its way to the next node or the final destination.

13.7 Configuring Port Asic-Based QoS on the Catalyst 6000 and 6500 Family

The QoS features configured on the port ASIC by the network administrator can be general enough to affect both inbound and outbound traffic flows in the switch/router. The following QoS features can be configured:

  • Define DSCP to IEEE 802.1p/Q mapping.
  • Configure bandwidth on Tx queues.
  • Configure priority values to queue threshold mapping.
  • Configure Tx queue packet drop threshold.
  • Configure Rx queue packet drop threshold.
  • Define input (ingress side) classification and port queues priority settings.
  • Define the trust state/level of a port.

An Ethernet frame processed by either the MSFC or the PFC is forwarded to the egress port ASIC for further processing. Frames processed by the MSFC have their IEEE 802.1p CoS values reset to zero. Any remarking of priority values has to be done on outbound ports.

Some of the QoS processing performed by the outbound port ASIC (i.e., outbound QoS processing) include the following:

  • Assignment of Tx queue tail-drop and WRED thresholds.
  • Mapping of IEEE 802.1p CoS values to Tx queue tail-drop and WRED thresholds.
  • Remarking of the IEEE 802.1p CoS value in the outbound Ethernet frame using a DSCP to IEEE 802.1p CoS map.

In addition to defining and setting IEEE 802.1p CoS values based on a global port definition, the network administrator can set specific IEEE 802.1p CoS values based on the destination MAC address and VLAN ID. This allows for Ethernet frames destined to specific destinations (e.g., data servers, call manager) to be tagged with a predefined IEEE 802.1p CoS value.

13.7.1 Trust States of Ports: Trusted and Untrusted Ports

The network manager can configure any given port on the Catalyst 6000/6500 switch/routers as “untrusted” or ”trusted.” The trust state of a port defines how it classifies, marks (or remarks), drops, and schedules arriving packets as they transit the switch/router. The default setting of all ports in the switch/router is the untrusted state.

13.7.1.1 Untrusted Ports (Default Setting for Ports)

When a port is configured as untrusted, packets entering the port will have their IEEE 802.1p CoS and IP Precedence values reset by the port ASIC to zero. This resetting means the arriving packet will be given the lowest priority service as it transits the switch/router. The network administrator can also configure the switch/router to reset the priority value of any packet that enters an untrusted port to a predefined priority value.

A port set as untrusted will not perform any priority-based congestion management technique such as WRED on its queues. WRED drops packets arriving at a queue based on their priority values once certain predefined queue thresholds are exceeded. All packets entering an untrusted port will be equally eligible to be dropped once the queue is completely full (tail-drop method for congestion management).

13.7.1.2 Trusted Ports

A port can be configured to maintain the priority values in packets as they enter the port and transit the switch/router. To allow this, the network administrator can set the trust state of the port as trusted. The untrusted state ignores the priority settings in arriving packets and the switch/router gives them the lowest-priority service.

The switch/outer uses an internal DSCP value to assign a predetermined level of service to the packets arriving through the trusted port. For packets entering a trusted port, the network administrator has to preconfigure the port to look at the existing IEEE 802.1p CoS, IP Precedence, or DSCP priority value in the packet to derive (using a mapping table) the internal DSCP value. The network administrator can also set a predefined internal DSCP value that can be assigned to every packet that enters the trusted port.

For example, the switch/router can use the IEEE 802.1p CoS value carried in the incoming packet to select the internal DSCP. The internal DSCP is then derived using either a default mapping table that was configured when QoS was enabled on the switch/router or a mapping table defined by the network administrator.

13.7.2 Input Classification and Setting Port-Based CoS

On an ingress port of the switch/router, a packet can have its priority value modified if it meets one of the following two criteria:

  • The port is configured as untrusted.
  • The arriving packet does not have an existing priority value already set.

13.7.3 Configure Receive (Rx) Drop Threshold

At an ingress port, an arriving packet is placed in an Rx queue. To provide congestion control and avoid queue overflows, the port ASIC implements four thresholds on each Rx queue. The port ASIC uses these thresholds to identify which packets can be dropped once these queue fill thresholds have been crossed. The port ASIC can use the priority values in the packets to identify which packets can be dropped when a threshold is exceeded.

As the queue occupancy starts to build up, the occupancy is monitored by the port ASIC. When a threshold is exceeded, packets with priority values predefined by the network administrator are dropped randomly from the queue. This allows higher priority packets to be preferentially accepted into the queue when congestion occurs.

The default drop thresholds can be modified by the network administrator to meet traffic management objectives. Also, the default priority values that are mapped to each threshold can also be modified. Different line cards of the Catalyst 6000/6500 support different Rx queue maximum sizes and thresholds.

13.7.4 Configure Transmit (Tx) Drop Threshold

An egress port may support two Tx queues: Queue 1 and Queue 2, each having two thresholds that are used as part of a congestion management mechanism. Queue 1 could be designated as a standard low-priority queue, while Queue 2 be designated as a standard high-priority queue. Depending on the line card, the congestion management mechanism can be either a tail-drop or a WRED algorithm.

The tail-drop algorithm could employ the two thresholds at a Tx queue in a weighted tail-drop fashion where high-priority packets are dropped only when the higher threshold is crossed (see Chapter 12). In weighted tail-drop, low-priority packets are dropped when the lower threshold is crossed, and both low-priority and high-priority packets are dropped when the higher threshold is crossed.

13.7.5 Mapping CoS to Thresholds

The network administrator can configure thresholds for port queues and then assign priority values to these thresholds. When a threshold is crossed, packets with specific priority values can be dropped. Typically, the network administrator will assign lower priority packets to the lower thresholds, thereby creating space to accept higher priority traffic into the queue when congestion occurs.

13.7.6 Configure Bandwidth on Tx Queues

When a packet is placed in one of the output port queues, it is transmitted out the port using an output scheduling algorithm. The output scheduler configured at the port could be WRR or any derivative of the deficit round-robin (DRR) algorithm (see Chapter 12). Depending on the line card type, a port may support two, three, or four transmit queues.

On the WS–X6248 and WS–X6348 line cards, two Tx queues are serviced by a WRR scheduler. The WS–X6548 line cards support four Tx queues per port. Of these four Tx queues, three are serviced by a WRR scheduler. The fourth Tx queue is a strict priority queue that is always preferentially serviced (over the other three queues) as long as it holds data. The Gigabit Ethernet line cards support three Tx queues, one of which is a strict priority and the other two serviced in a WRR fashion.

Typically, the network administrator assigns a weight to each Tx queue that determines how much traffic will be transmitted from that queue before the WRR scheduler moves to the next queue. A weight from 1 to 255 can be assigned to each of the queues serviced by the WRR scheduler.

13.7.7 DSCP to CoS Mapping

When a packet has been processed and forwarded to an egress port queue, the port ASIC uses the packet's assigned internal DSCP priority value to perform WRED-based congestion management and also to determine the packet's priority queue and its scheduling bandwidth. The switch/router also uses a default map to map back the packet's internal DSCP value to an external IEEE 802.1p CoS, IP Precedence, or DSCP value.

Alternatively, the network administrator can configure a map that can be used by the switch/router to map the assigned internal DSCP value to a new IEEE 802.1p CoS, IP Precedence, or DSCP value for the outbound packet.

13.8 IP Precedence and IEEE 802.1p CoS Processing Steps

Both Layer 2 and Layer 3 switches offer a number of QoS features that include packet classification, input priority queue scheduling, traffic policing and shaping, packet rewriting, and output priority queue scheduling. In general, the classification and QoS features offered by Layer 2 switches are limited to using Layer 2 header information in arriving packets.

The Catalyst 6000/6500 family, on the other hand, have QoS functions processed by a Layer 2 engine that has access to the Layer 3 and Layer 4 packet parameters as well as the to the Layer 2 header information.

As already discussed, in addition to other Layer 2, 3, and 4 packet information, the Catalyst 6000/6500 can access three basic types of packet fields when making QoS decisions [CISCQoSOS07]:

  • The IP Precedence that are the first three leftmost bits of the (now obsolete) ToS field in the IP header.
  • The DSCP field that are the first six leftmost bits of the newer DiffServ field in the IP header.
  • The IEEE 802.1p CoS bits that are 3 bits that are either part of the Cisco ISL header ((used to carry VLAN information in Ethernet frames)) or in the IEEE 802.1Q tag. There is no IEEE 802.1Q tag in an untagged Ethernet frame.

When QoS is disabled in the switch/router, it does not perform any packet classification, marking, or remarking; instead, every packet entering the switch/router with a DSCP or IP Precedence value leaves with the same priority value (unaltered).

The following sections describe the various QoS operations that are applied to a packet as it transits the switch/router. Figure 13.1 summarizes how the Catalyst 6000/6500 family implements these QoS operations. This figure and the corresponding discussions below describe the QoS operations at the following major modules:

  • Input (ingress) port ASIC on a line card.
  • Forwarding engine (PFC).
  • Output (egress) port ASIC on a line card.
    Figure depicts QoS processing in the Catalyst 6000/6500 switches.

    Figure 13.1 QoS processing in the Catalyst 6000/6500 switches.

13.8.1 Input Port Processing

The main parameter that affects the QoS configuration (particularly packet classification) at the ingress port is the trust state of the port. Each port of the Catalyst 6000/6500 switch/routers can be configured to have one of the following trust states:

  • Untrusted
  • Trust-cos
  • Trust-ip-precedence
  • Trust-dscp

“Trust-cos” refers to setting a port to trust IEEE 802.1p/Q markings in inbound packets. If a port is configured to the untrusted state, an arriving packet is simply marked with the port's default (i.e., predetermined) priority setting and the packet header is sent to the forwarding engine (PFC) for processing.

If the port is configured to have, for example, the trust-cos state, the default port IEEE 802.1p/Q setting is applied if the arriving packet does not already have an IEEE 802.1p/Q or ISL tag. Otherwise, the incoming IEEE 802.1p/Q and ISL tag is kept as it is and the packet is passed to the forwarding engine.

Using IEEE 802.1p/Q as an example, each arriving packet will have an internal DSCP value assigned (either the received IEEE 802.1p CoS or the default port IEEE 802.1p CoS), including untagged Ethernet frames that do not carry any IEEE 802.1Q CoS tags [CISCQoSCM07]. For instance, in a switch/router with a 32 Gb/s shared switching bus (see Chapters 7 and 9), this internal DSCP value and the received IEEE 802.1p/Q value are written in a special packet header (called a Data Bus header) and forwarded over the Data Bus to the PFC (forwarding engine).

The creation and forwarding of the Data Bus header takes place at the ingress line card. Also, at this stage of the packet forwarding, it is not known yet whether the assigned internal DSCP will be carried to the egress port ASIC and written into the outgoing packet. The actual priority value written into the outgoing packet depends on what actions the PFC performs on that packet. The input classification process is described in detail in [CISCQoSCM07].

A packet that enters the switch/router is initially processed by the receiving port ASIC. The port ASIC places the packet in the appropriate Rx queue. Depending on the switch/router line card type, one or two Rx queues may be supported. The port ASIC uses the priority value in the packet to determine which queue to place the packet into (if the port supports multiple input queues). If the port is set as untrusted, the receiving port ASIC can overwrite the existing priority value in the packet with a predefined priority value (Figure 13.2).

Figure depicts QoS processing in the Catalyst 6000/6500 switches with centralized forwarding.

Figure 13.2 QoS processing in the Catalyst 6000/6500 switches with centralized forwarding.

13.8.2 Forwarding Engine (PFC) Processing

Once the special packet header has been received by the forwarding engine (PFC), the packet is assigned an internal DSCP. This internal DSCP indicates and associates an internal resource priority to the packet by the PFC as the packet transits the switch/router.

This internal tag is not the same as the normal DSCP value carried in the IPv4 or IPv6 header. It is derived from the existing IEEE 802.1p, IP Precedence, or DSCP setting in the arriving packet and is used as reserve internal switch/router resources and to reset the priority value in the packet as it exits the switch/router. This internal DSCP is assigned to all packets Layer 2 and Layer 3 forwarded by the PFC, and even non-IP packets such as ARP, ICMP, and IGMP messages.

The packet is then passed to the Layer 2/Layer 3 forwarding engine in the PFC, which applies any classification and, optionally, traffic policing (rate limiting) policies to the packet. The process of assigning the packet a DSCP value as described above is part of the classification process. This internal DSCP will be used internally by the switch/router for processing the packet until it reaches the egress port. The internal DSCP for a packet will be derived using one of the following:

  • The internal DSCP is derived from the IEEE 802.1p CoS value already set in the packet prior to entering the switch/router. Since there are a maximum of eight possible IEEE 802.1p CoS values, each of which must be mapped to one of 64 DSCP values, a mapping table must be used. This mapping table can be created by the network administrator, or the switch/router can use a default map in place.
  • The internal DSCP is derived from the IP Precedence value already set in the IPv4 or IPv6 header packet prior to entering the switch/router. As there are only eight IP Precedence values and 64 DSCP values, the network administrator can configure a mapping table that can be used by the switch/router to derive the internal DSCP. A default mapping table can also be used should the network administrator not configure a mapping table.
  • The internal DSCP is derived from the existing DSCP value set prior to the packet entering the switch/router.
  • The internal DSCP is derived for the packet using a DSCP default value typically assigned though an ACL entry.

The rules that determine which of the above four possible mapping mechanisms to be used for each packet are described in [CISCQoSCM07]. Furthermore, how the internal DSCP is selected depends on the following factors:

  • The trust state of the port.
  • An ACL applied to the port.
  • A default ACL applied to the port.
  • Whether an ACL applied is VLAN-based or port-based.

After the PFC assigns an internal DSCP value to the packet, a policing (rate-limiting) policy can then be applied should such a policy be configured. The traffic policing might result in the internal DSCP value of a packet being marked down. The policing may involve the PFC dropping or marking down packets that are out-of-profile. Out-of-profile refers to traffic that has exceeded a rate limit defined by the network administrator (Figure 13.3).

Figure depicts QoS processing in the Catalyst 6000/6500 switches with distributed forwarding.

Figure 13.3 QoS processing in the Catalyst 6000/6500 switches with distributed forwarding.

13.8.3 Output Port Processing

After processing the packet, the PFC will then forward the packet to the egress port for final processing. At the egress port, the port ASIC initiates the rewrite process to modify the priority value in the packet. The new value to be rewritten is derived from the internal DSCP. The rewrite can be done according to the following rules:

  • If the egress port is configured to perform an ISL or IEEE 802.1Q VLAN tagging, the port ASIC will use an IEEE 802.1p CoS value derived from the internal DSCP, and write this in the ISL or IEEE 802.1Q tagged packet.
  • If the packet had an IP Precedence value prior to entering the switch/router, the egress port ASIC will copy the internal DSCP value into the IP Precedence bits of the outgoing header.

This internal DSCP is also used for traffic scheduling at the output port. Once the new priority value is derived from the internal DSCP and written into the packet, the packet is placed in one of the output queues for output scheduling based on its priority value (even if the packet is not IEEE 802.1Q or an ISL tagged).

The packet will then be held temporarily in a transmit queue based on its priority value, ready for transmission. The output port priority queuing can be configured to consist of, for example, one strict priority queue, two standard queues with two thresholds per queue [CISCQoSOS07]. While the packet is in the queue, the port ASIC monitors the queue occupancy and implements any WRED actions to avoid buffer overflow.

The output scheduler then selects the queue from which the next packet should be transmitted. A WRR scheduling algorithm could be used to schedule and transmit packets from the egress port queues. The output port also has an arbiter (denoted ARB in Figure 13.1) that checks between each packet transmission from the WRR-serviced queues to determine if there are any data in the strict priority queue that have to be preferentially serviced.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset