4
Special Traffic Types and Networks

So far in this book, we have made a distinction between real-time and nonreal-time traffic as the major differentiating factor regarding the tolerance toward delay, jitter, and packet loss. We have also kept the traffic flows simple, just a series of Ethernet, IP, or MPLS packets crossing one or more devices.

This is clearly an over simplification, so in this chapter we take one step further by examining some of the more special scenarios. The environment where a QOS deployment is performed, either in terms of the type of traffic and/or type of network present, always needs to be taken into consideration. However, some environments are more special than others due to their uniqueness or just because they are a novelty; as such the authors of this book have selected the following scenarios as the most “special ones” in their opinion, and each one will be detailed throughout this chapter:

  • Layer 4 transport protocols—The User Datagram Protocol (UDP) and Transmission Control Protocol (TCP). Understanding how traffic is transported is crucial, for example, if the transport layer can retransmit traffic, is packet loss really an issue? Or how does the transport layer react when facing packet loss? How can it be optimal?
  • Data Center—An area where the developments in recent years mandate a special attention. We will analyze storage traffic, the creation of lossless Ethernet networks, and the challenges posed by virtualization. Also there’s the current industry buzzword Software-Defined Networks (SDN), is it a game changer in terms of QOS?
  • Real-time traffic—We will analyze two different applications: a voice call and IPTV. The aim is to identify the underlying differences between different applications and secondly to decompose real-time traffic into its two components: signaling and the actual data.

While it should be noted that the above “special scenarios” can be combined, however, the aim of this book is to analyze each single one independently.

4.1 Layer 4 Transport Protocols: UDP and TCP

Let’s start with UDP because it is the simplest one and it is a connectionless protocol; packets received by the destination are not acknowledged back to the sender, which is another way of saying that UDP is blind to congestion in the traffic path and if consequently packets are being dropped. The only assurance it delivers is packet integrity by using a checksum function.

So it is unreliable and it has no congestion control mechanism. However, it is extremely simple and highly popular.

The term unreliable should be seen in the context of the UDP itself; nothing stops the application layer at both ends of the traffic flow to talk to each other (at that layer) and signal that packets were dropped. For certain traffic types, such as real time, the time frame when the packet is received is crucial, so retransmission of traffic can be pointless.

There are several other characteristics of the UDP, but the key thing to retain is that the UDP is clueless regarding congestion and as such it has no adaptation skills.

The interesting Layer 4 transport protocol in terms of QOS is TCP, because it has two characteristics that make it special and popular, flow control and congestion avoidance tools, so as congestion happens, TCP changes its behavior.

It is not our goal to provide a detailed explanation of TCP; there are numerous papers and books that describe it in detail, and several of them are referenced in the further reading section of this chapter. Our goal is to present an oversimplified overview of TCP to arm the reader with the knowledge to understand its key components of slow start and congestion avoidance, which are commonly mentioned as the mechanisms for handling TCP congestion.

TCP is a connection-oriented protocol; it begins with a negotiation stage that leads to the creation of a session between sender and receiver. After the session is established when the receiver gets a packet from the sender, it issues an acknowledgement (ACK) back to the sender, which means the sender knows if the transmitted packet made it to the receiver or not. However, to avoid the sender waiting indefinitely for the ACK, there is a timeout value, after which the sender retransmits the packets. Also, TCP packets have a sequence number, for example, if the receiver gets packet number one and number three but not number two, it signals this event toward the sender by sending double ACK packets.

Now comes the adaptation part: TCP is robust, and it changes its behavior during the session duration according to the feedback that sender and receiver have regarding the traffic flow between them. For example, if packets are being loss, the sender will send those packets again and also it will send less packets next time. The “next time” is incredible vague on purpose, and we will exact the meaning further ahead, but as a teaser let us consider a theoretical and purely academic scenario illustrated in Figure 4.1.

Image described by caption/surrounding text.

Figure 4.1 Proactive packet drop

Assume that for each packet that is lost the sender lowers the transmit rate by 50% and the network resource usage is at 90%, as illustrated in Figure 4.1. Although there are resources to serve packet number one, should it be dropped? If packet one is dropped, then “next time” the sender sends packet two and not two and three (50% transmit rate reduction). However, if packet one is not dropped, then “next time” both packets two and three are transmitted and then dropped due to a full queue, which forces the sender to stop transmitting packets at all (two times 50% reduction). Of course the reader may reverse the question, if there was only packet number one in Figure 4.1, why even consider to drop it? The answers are coming, but first we need to dive a bit further into TCP.

Back to our TCP session. What is transmitted between the sender and receiver is a stream of bytes, which is divided into segments, and each segment is send individually with a unique sequence number, as illustrated in Figure 4.2 for transferring a 3-megabyte file using a segment size of 1460 bytes.

Schematic of a TCP session, in which streams of bytes are trasmitted from a sender to a receiver. The streams of bytes are segmented and numbered.

Figure 4.2 TCP segmentation

This segmentation and numbering represents one of TCP’s important features, since segments can arrive at the destination out of order, or sometimes a little late, but it poses no problem because the destination can reorder it by keeping track of the byte positions associated with each segment’s number.

Now comes another important feature; everything the destination receives is ACK back toward the sender, so the source knows what segments the destination has as illustrated in Figure 4.3.

Image described by caption/surrounding text.

Figure 4.3 TCP acknowledge of segments

Each segment, and thereby each received byte position, is ACK to the sender by the receiver. The ACK can be for one segment or be for several in one window of the transmission, as illustrated in Figure 4.3 for the second ACK packet. In the event there is one segment lost, the receiver starts sending double ACK packets for that segment, and the sender upon the reception of the double ACK resends that missing segment, as illustrated in Figure 4.4.

Image described by caption/surrounding text.

Figure 4.4 TCP duplicate ACK

It is this reliable transmission mechanism that makes TCP so popular. The applications on top of TCP can safely trust its mechanism to deliver the traffic, so an application such as HTTP can focus on its specific tasks and leave the delivery of traffic to TCP, relying on it to fix any transport issues and to pack and unpack files in the proper order and byte position. All of this responsibility is left entirely to the end hosts (the sender and receiver), thereby relieving the intervening network transport elements from the need to perform transmission control. Just imagine if every router in the Internet were required to maintain state for each session, and be responsible, and ACK every packet on all links. For sure the Internet would not be what it is today if not for IP and TCP.

4.1.1 The TCP Session

Now let’s focus on a real-life TCP session, a Secure Shell Copy (SCP) transfer (a random choice, SCP is just one to be one of many applications that uses TCP).

As illustrated in Figure 4.5, we have a sender (a server) with IP address 192.168.0.101 and a receiver (a host) with IP address 192.168.0.27, and just like any TCP session, it all starts with the classic three-way handshake with the sequence of packets SYN (synchronize), SYN-ACK, and ACK.

Schematic of a TCP three-way handshake process, with a sender (IP address 192.168.0.101) and a receiver (IP address 192.168.0.27).

Figure 4.5 The TCP three-way handshake process

There are several parameters associated with a TCP session, but let’s pick a few relevant ones for our current example.

The Maximum Segment Size (MSS) represents the largest chunk of data that TCP sends to the other end in a single segment. As illustrated in Figure 4.5, both end points of the session use 1460. As a side note, this value is typically inferred from the Maximum Transmission Unit (MTU) associated with each connection, so in the above example the connection is Ethernet (without VLAN tag) and the MTU is set to 1500 bytes; the MSS is equal to the MTU minus the TCP header (20 bytes) and the IP header (20 bytes)—that gives us 1460 bytes.

The window size (WIN) is the raw number of bytes that can be received or transmitted before sending or receiving an ACK. The receiver uses the receive window size to control the flow and rate from the sender, and a related important concept is the sliding window, which is the ability to change the window size over time to fully optimize the throughput.

The congestion window (CWND) defines what the sender can transmit while having outstanding ACK to be received, but for now suffice it to say that every time an ACK is received, it grows, and every time there is a timeout or the receiver flags that packets are missing, it diminishes.

Now that the TCP session is established, it is time to transfer real traffic for the first time. This is the stage named TCP slow start. Depending on the TCP stack implementation, the source will send one or two MSS full-sized packets; in this example let’s assume two, as illustrated in Figure 4.6.

Schematic of a Double Push, with a sender (IP address 192.168.0.101) and a receiver (IP address 192.168.0.27).

Figure 4.6 Double push

Because the packets arrive in sequence, the receiver needs to ACK only the last sequence byte size ending, which is 5798 (4350 + 1448), and this is how TCP optimizes the session speed. So all went smoothly and time has come to explain the “next time” concept mentioned briefly at the beginning of this section.

There is a successful transmission of data confirmed by the reception of the ACK packet, so the congestion window increases and next time we transmit more. This is what has earned the nickname of greedy to the TCP protocol; last time we transmitted two MSS full-sized packets, but now we will transmit nine MSS full-sized packets as illustrated in Figure 4.7.

Schematic of a Big TCP Push, with a sender (IP address 192.168.0.101) and a receiver (IP address 192.168.0.27).

Figure 4.7 Big TCP push

Where does it end? In a perfect world and a perfect network, the session’s speed is ramped up to the receiver’s window size and the sender’s congestion window; however, the interesting question is, What happens if packets are dropped?

4.1.2 TCP Congestion Mechanism

The current TCP implementations in most operating systems are very much influenced by two legacy TCP implementations called Tahoe and Reno; both of them originate with UNIX BSD versions 4.3 and 4.4, and no doubt some eventful hiking trip. Their precise behavior (and of other more recent TCP stack implementations) in terms of mathematical formulas is properly explained and detailed in the references [1] at the end of this chapter, so here is just an oversimplified explanation of them.

So packet loss can be detected in two different manners: the timeout expired before the ACK packet was received, which always implies going back into the slow-start phase, or the sender is getting double ACK packets from the receiver.

For Tahoe, when three duplicate ACKs are received (four, if the first ACK has been counted), it performs a fast retransmission of the packet, sets the congestion window to the MSS value, and enters the slow-start state just like a timeout happened. Calling it fast is actually misleading; it’s more of a delayed retransmission, so when the node has received a duplicate ACK, it does not immediately respond and retransmit; instead, it waits for more than three duplicate ACK before retransmitting the packet. The reason for this is to possibly save bandwidth and throughput in case the packet was reordered and not really dropped.

Reno is the same, but after performing the fast retransmission, it halves the congestion window (where Tahoe makes it equal to the MSS value) and enters a phase called fast recovery. Once an ACK is received from the client in this phase, Reno presumes the link is good again and negotiates a return to full speed. However, if there is a timeout, it returns to the slow-start phase.

The fast recovery phase is a huge improvement over Tahoe and comes as no surprise that most network operating systems today are based on the Reno implementation, because it tries to maintain the rate and throughput as high as it can and avoid as long as possible to fall back to TCP slow-start state, as illustrated in Figure 4.8.

Schematic of a TCP throughput and congestion avoidance, with the Y axis marked cwnd and the X axis represents TCP slow-start state.

Figure 4.8 TCP throughput and congestion avoidance

In recent years there have been several new implementations of the TCP stack such as Vegas, New Reno (RFC 6582), and Cubic with some differences in terms of behavior. For example, the Cubic implementation that is the default in Linux at the time of the writing of this book implements a cubic function, instead of linear, in the adjustment of the congestion window. The major novelty of the Vegas implementation is the usage of the round-trip time (RTT) parameter, the difference in time of sending a packet and receiving the corresponding ACK, which allows a more proactive behavior regarding when congestion starts to happen.

4.1.3 TCP Congestion Scenario

In the old slow networks based on modems and serial lines, retransmissions occurred right after the first duplicate ACK was received. It is not the same in high-performance networks with huge bandwidth pipes, where actual retransmission occurs some time after, because frankly, many packets are in flight. The issue gets more complex if a drop occurs and both the sender’s congestion window and receiver’s window are huge. The reordering of TCP segments becomes a reality, and the receiver needs to place the segments back into the right sequence without the session being dropped or else too many packets will need to be retransmitted.

The delay between the event that caused the loss, and the sender becoming aware of the loss and generating a retransmission, doesn’t just depend on the time that the ACK packet takes from the receiver back to the sender. It also depends on the fact that many packets are “in flight” because the congestion windows on the sender and receiver can be large, so client retransmission sometimes occurs after a burst of packets from sender. The result in this example is a classic reorder scenario as illustrated in Figure 4.9.

Image described by caption/surrounding text.

Figure 4.9 Retransmission delay

In Figure 4.9, each duplicate ACK contains the SCE/SRE parameters, which are part of the Selective Acknowledgement (SACK) feature. At a high level, SACK allows the receiver to tell the sender (at the TCP protocol level) exactly which parts of a fragmented packet did not make it and ask for exactly those parts to be resent, rather than having the server send the whole packet again, which possibly could be in many fragments if the window is large.

In summary, TCP can take a fair beating and still be able to either maintain a high speed or ramp up the speed quickly again despite possible retransmissions.

4.1.4 TCP and QOS

The TCP throughput and congestion avoidance tools are tightly bound to each other when it comes to the performance of TCP sessions, and with current operating systems, a single drop or reordering of a segment does not cause too much harm to the sessions. The rate and size of both the receiver window and the sender congestion window are maintained very effectively and can be rapidly adjusted in response to duplicate ACK. However, there is a thin line between maintaining a good pace in the TCP sessions and avoiding too many retransmissions, which are a waste of network resources since there is no benefit in transmitting the same packet several times unless absolutely necessary.

There are several QOS tools discussed later in this book that are designed to handle the pace of TCP sessions, such as the queue dropper behavior, token bucket policing, or leaky bucket shaping.

Returning to the example of Figure 4.1, as the queue fill level approaches 100%, the dropper can start to drop “some” packets causing TCP to adapt instead of waiting for the queue to become full and then drop all packets forcing a return to the slow-start phase.

These tools are TCP friendly because they can be used to control whether a certain burst is allowed, thereby allowing the packet rate peak to exceed a certain bandwidth threshold to maintain predictable rates and stability, or to control bursts to stop issues associated with misconfigured or misbehaving sources as soon as possible without propagating that harmful traffic into the network. They can also implement large buffering to allow as much of burst as possible, thus avoiding retransmissions and being able to maintain the rate. Or in the other hand they can implement small buffering to avoid delay and jitter and also stop the transmit and receive window size becoming very large, which can result in massive amounts of retransmissions and reordering later on.

4.2 Data Center

Since the publication of the first edition of this book in 2010, the Data Center (DC) network evolution has been enormous.

It is now possible to create Ethernet networks with a lossless behavior, so that when facing congestion they have no packet loss, which, for instance, permits to transport Fiber Channel (FC) traffic as Fiber Channel over Ethernet (FCOE) among several other possible applications.

Another major change in the recent years is the massive deployment of virtualization; physical servers are not just physical servers anymore, but now they host multiple virtual machines (VMs) with a hypervisor as the front end.

And at last we now have Software-Defined Networks (SDN) allowing for faster and simpler deployment of connectivity between servers in the DC.

It is not the goal of this book to discuss the DC network design topics and their associated performance comparisons, for example, by comparing FCOE versus distributed file systems over IP, but solely to focus on the key points mentioned previously and the impacts in terms of QOS. But first let us describe storage traffic, because it is indeed a special traffic type.

4.2.1 SAN Traffic

Of all the traffic types that exist in a DC, the SAN traffic is special due to its requirements. If, in the process of writing information into a remote disk, packets are loss or placed out of order, then the disk is corrupted, so it has a zero tolerance regarding packet loss or packet reordering. And there is also the requirement for lower latency.

Networks can be either lossy or lossless, but the same principle applies to protocols; the TCP early described in this chapter assures a lossless environment by retransmitting packets that were lost, and by keeping track of the sequence numbers, it also assures that packet order is maintained. This is the basis of the Internet Small Computer System Interface (iSCSI) protocol, transmitting traffic over a lossy network with the TCP assuring the required lossless behavior.

As a side note, TCP is not an option for FCOE since FCOE is not IP based; however, let’s keep comparing the different approaches to achieve lossless behavior.

There are two scenarios regarding at which OSI layer the lossless component is assured: at the transport layer with TCP or at a lower layer by the network itself. At first glance they might seem similar, but in reality they are different. The first scenario has already been detailed in this chapter with the TCP adaptation and congestion avoidance mechanisms. In the second scenario, the flow control and throughput reduction happen directly in the network devices. This can create concerns regarding how to apply flow control only to the flow causing congestion without affecting the other ones. It should be noted that with TCP packets are indeed lost, and then retransmitted, which is structurally different from not loosing packets in the first place.

There is also a third possible scenario, which is the combination of the preceding two. For example, iSCSI traffic can be transported over a lossless network; however, in that scenario the TCP adaptation and congestion mechanisms will never be used, because the network will perform flow control before TCP can act. This can potentially create issues, since TCP relies on its adaptation component to optimize the throughput.

But for now the most interesting scenario is how to transform an Ethernet network in a lossless one.

4.2.2 Lossless Ethernet Networks

In the FC networks the lossless component is assured via end-to-end signaling combined with a credit system, where the sender only sends traffic to the receiver if the receiver has buffers available, thus avoiding the situation of traffic arriving at the receiver and being dropped due to the lack of resources. This is flow control. There are a couple of drawbacks: FC devices are traditionally expensive, and running two separate networks always implies a higher cost and complexity from an operational and management point of view.

An Ethernet network doesn’t have the credit capability like FC has; however, it has been given the PAUSE capability, which in a nutshell is the ability of a device to signal to its neighbor that it cannot transmit traffic and also its queue buffer is full, so it cannot store anything either. So to avoid dropping newly arrived packets, the neighbor needs to PAUSE the sending of traffic for a certain amount of time, as illustrated in Figure 4.10.

Image described by caption/surrounding text.

Figure 4.10 PAUSE frames

The PAUSE frame achieves the same result of the credit concept in the FC networks, flow control, controlling the flow of traffic along its path to assure that packets are not dropped. However, there are some drawbacks.

The generation and propagation of a PAUSE frame work in a hop-by-hop fashion and not from the congestion point directly toward the source of the traffic, so a congestion propagation phenomenon can happen. For example, in a traffic flow crossing a chained connection between devices one, two, and three, and when the queue buffer goes above a certain threshold in device three, device three signals that to device two, which then starts to store traffic in a queue instead of transmitting it to device three. However, that ability to store traffic is not infinite as previously detailed in Chapter 2, so if the congestion doesn’t stop soon, then device two will also run out of resources and signal that to device one. So it can take a while until the original source of the traffic is paused, and in the meantime all the devices along the traffic path are paused. This is commonly named the head-of-line (HOL) blocking scenario; if the congestion is not transient, the result could be that the flow of traffic is paused in all the devices it crosses, so the traffic flow is indeed controlled, but it is also completely paused.

To address this several protocols were created under the Data Center Bridging (DCB) umbrella. There are several DCB protocols and references to them at the end of this chapter, but the most interesting one from the perspective of this book is the Priority Flow Control (PFC).

An Ethernet frame has a QOS marking (a concept detailed in Chapter 5), and the generation of PAUSE frames will be relative to a specific QOS marking, effectively pausing just the traffic on the class of service that uses that QOS marking. So if SAN traffic uses a unique QOS marking, then PAUSE frames can be made specific for it, as illustrated in Figure 4.11 for three traffic types—black, gray, and SAN—each type with a different QOS marking.

Image described by caption/surrounding text.

Figure 4.11 PAUSE frames per QOS marking

The other relevant concept is the definition of priority groups, which is linked with the hierarchical scheduling concept that is detailed in Chapter 8. The goal is to apply queuing and scheduling first inside each priority group and then afterward apply queuing and scheduling to all the traffic, as illustrated in Figure 4.12.

Image described by caption/surrounding text.

Figure 4.12 Priority Flow Control

So if SAN traffic has a unique QOS marking, then PAUSE frames are specific to SAN traffic, and also if SAN flows are assigned to a specific priority group, then they do not compete for the same resources with the other traffic types. Returning to Figure 4.12, such assurance can be given by the rightmost scheduler configuration, by, for example, giving the third queue a certain transmit rate. It should be noted that the lossless behavior is delivered by the existence of PAUSE frames, and what PFC allows is to leverage that functionalities even further.

4.2.3 Virtualization

The concept of virtualizing a physical host into multiple logical ones can be seen as the typical N : 1 connection, where each VM talks to the hypervisor that then decides if the destination is another VM inside the same host or if the destination is outside this physical host. As with any N : 1 connection, it is always vulnerable to congestion, and given the presence of different traffic types, then prioritization may also be required.

Today, hypervisors have mainly switching but also routing and security capabilities similar to any switch, router, or firewall, and the same applies for QOS. All the QOS tools discussed in this book are now starting to be available at the hypervisor allowing administrators to apply QOS directly at that level.

The real challenges in terms of QOS are the increase and the predictability of traffic flows between servers inside the DC (also commonly called the East–West traffic). This increase is due to a major change in how a “transaction” (in the lack of a better word) takes place between a user outside the DC and the resources inside the DC.

Let’s use as an example a book purchase from an online store: the user opens its browser, does a search, and gets as a result a page with contents, pricing, pictures, and reviews among several other type of contents. Now, if all the book details and the web portal are in the same physical host in the DC, then this is simple in terms of traffic flows, because traffic enters the DC, arrives at a physical server where everything is stored, and the server responds to the user (ignoring gateways and firewalls for ease of understanding). This is typically called a North–South traffic flow. Well, this interaction belongs in the past because now all the book contents will be spread among different VMs, so the North–South traffic from the user to the DC will generate multiple West–East communications between different VMs inside the DC to build all the contents that the user sees in its browser, as illustrated in Figure 4.13.

Image described by caption/surrounding text.

Figure 4.13 West–East traffic flows in a Data Center

Now comes the second challenge: today the book pictures could be stored in a physical host on the West side and tomorrow in a physical host in the East side, so how to predict the required resources inside the DC becomes a challenge.

The predictability factor is the hardest one to cope with. If two VMs are communicating internally inside the physical host, then only the host hypervisor deals with that traffic. However, when one of those VMs is moved to another physical host, that same traffic that was previously “hidden” from the rest of the network is now out there, using the DC devices and links that interconnect those two physical hosts. And hundreds of VMs can be potentially moved across physical hosts during a short period of time at just about any mid- to large-size DC today. So this leads to the creation of traffic peaks that are not easily predictable, and depending on the oversubscription ratios present in the DC, buffering capabilities could be required. This is the scenario that is explored beyond in the DC case study of this book.

Another scenario that has an impact is in the migration of a DC. When VMs are being moved from the old to the new DC, there is traffic that before was only seen by the host hypervisor that is now traveling across the two DC interconnection links demanding resources.

4.2.4 Software Defined Networks

At the time of this writing, the industry’s hot keyword is SDN, the ability to split the control and data planes in a network and deploy virtualized resources and build service chains with those virtualized (and physical) resources using orchestration tools and SDN controllers. The details around SDN are outside the scope of this book, but let’s look at a typical chat between Viriatus (a server guy) and Viking (a networking guy) before SDN:

Viriatus

“I need connectivity between servers on racks 120 and 332 towards the server in rack 200, like a hub and spoke topology you know… Rack 200 is the hub”

Viking

“What VLAN is that?”

Viriatus

“VLAN? I have no idea”

Viking

“OK, no worries, I’ll find out a VLAN number we can use, we’ll actually need two, but leave it to me … but you need to be aware this is going to take a couple of days to set up”

Viriatus

“What? Why?”

Viking

“Well I need to configure those VLANS on the top of the rack switches of racks 120, 332, and 200 an also configure Layer 3 interfaces to allow that inter-VLAN routing at our gateway router, and most important I need to make sure I don’t break anything else, like using a VLAN number that is already being used”

Viriatus

“Damn …”

[after 1 week]

Viking

“Is it working now right?”

Viriatus

“No!”

Viking

“Ah wait, must be the firewall, need to change that as well, sorry”

Now with SDN the story is different, and simpler, due to central controller nodes controlling the deployment of virtual networks and allowing to establish any form of desired connectivity between servers, without implying changes in the switching and routing infrastructure or by simply automating those changes to the underlay network. The goal is for the team that deploy and manage applications (and not the ones who own the infrastructure) to be able to request the network resources they need in a matter of seconds. This represents an obvious game changer. But how does that impact QOS?

If you look at SDN from the perspective of an automation process that allows application owners to create virtual networks, then if those virtual networks have QOS requirements or not become simply a part of such automation process. For example, if the application requires a VLAN to be created and used between three servers inside the DC, with or without QOS, the automation process will simply deploy QOS in that VLAN or not.

In terms of functionality there is nothing really new. QOS has proven itself in the past in both the nonoverlay (e.g., in Ethernet switching networks) and in the overlay (e.g., L2VPN or L3VPN) realms, so any SDN flavor is unlikely to impose dramatic changes on the way QOS is implemented.

However, for the SDN use cases where individual per-subscriber and per-application flows are being identified, the granularity with which QOS can be applied may become much tighter. This is due to a closer binding between the network devices where the traffic travels via the data plane and the applications, so features such as admission control on a per-application basis can be easily deployed. The same granularity can be applied regarding paths in the network and bandwidth traffic engineering.

4.2.5 DC and QOS

The ability to create Ethernet lossless networks sounds perfect at first glance; however, it does create some challenges such as congestion propagation and HOL blocking. For example, at present the design of a DC network regarding the transport of FCoE is typically made using a simple logical topology with a small number of hops, and then special care is given to the oversubscription ratios.

The major challenge created regarding virtualization is the increase and predictability of East–West traffic. To cope with it, the first crucial step is visibility, typically achieved using telemetry and analytics. After visibility is present, a careful analysis of the oversubscription rations, and the deployment of buffering is typically required as well, which are topics further explored in the DC case study in the third part of this book.

SDN is a game changer in many ways, but in terms of QOS it is not so much; it is not expected to dramatically change the way QOS is applied, but it can allow for more granular deployments.

4.3 Real-Time Traffic

The handling of real-time traffic in packet-based forwarding networks is one of the major challenges Internet carriers, operators, and large enterprises have faced in the recent past, because it is structurally different from other traffic types. How is it different? First, because it is not adaptive in any way, it should come at no surprise that UDP is typically the transport protocol. There can be encoders and playback buffering enabled, but in its end-to-end delivery, it needs a constant care and feeding of resources. Second, it is selfish, but not greedy, and it requires specific traffic delivery parameters to achieve its quality levels, and ensuring those quality levels typically demands lots of network resources. If the traffic volume is always the same, the delivery does not change, but for that to happen, it needs protection and baby-sitting since rarely can lost frames be restored or recovered, and there is only one quality level: good quality.

There are several shapes and formats of real-time traffic, but the two most widely deployed are voice encapsulated into IP frames and streamed video encapsulated into IP frames, where typically short frames carry voice and larger frames carry streamed video. It can get more interesting as these services can be mixed in the same infrastructure, for example, the same 3G mobile phone receives data, voice, and video over the same infrastructure.

What all forms of real-time traffic have in common is the demand for predictable and consistent delivery and quality, and there is little or no room for variation in the networking service. The good news is that in most situations the bandwidth, delay, and jitter requirements can be calculated and therefore predicted.

4.3.1 Control and Data Traffic

For many forms of real-time applications, Real-Time Protocol (RTP) is the most suitable transport helper and is often called the worker of a real-time session, because it carries the actual real-time data, such as the voice or streamed media, where the signaling is delivered by the RTP Control Protocol (RTCP). But let’s focus on RTP for a moment.

What RTP delivers is the transformation of IP packets into a stream and places four key parameters in the packet header: the sequence number, the timestamp, the type, and the synchronization source identifier (SSRC). The sequence number allows one to keep track of the order of the frames, enabling packet loss detection but also packet sequence restoration if frames are out of order. The timestamp reflects the sampling instant of the first octet in the RTP data packet—it is a clock that increments monotonically and linearly to allow synchronization and jitter calculations so that the original timing of the payload can be assembled on the egress side, thus helping maintain that payload quality. The Type field represents the type of data that is in the payload, and the SSRC identifies the originator of the packet.

As for the control part, the RTCP delivers the signaling for RTP, providing feedback on the quality of the data distribution and statistics about the session. Its packets are transmitted regularly within the stream of RTP packets, as illustrated in Figure 4.14.

Schematic of a RTP data packets and RCTP control packet rates, with a host (IP address 172.26.200.18) and a server (IP address 172.26.73.4).

Figure 4.14 The RTP data packets and RCTP control packet rates

Besides all the typical differences in the rates, as illustrated in Figure 4.14, the RTP packet size depends on its contents so they can be small or large while the RTCP packets are typically small.

The point here is to highlight that real-time traffic is not just real-time traffic; there are a control and a data component.

Regardless of the difference in packet size and rates, RTP and RTCP packets belong to the same real-time session and are so closely bound together that in most scenarios it makes sense to place them in the same class of service. However, if RTP is being used to transport streaming video (large packets) and also voice (small packets), mixing them in the same queue can raise some challenges in terms of dimensioning the queue size and predicting the reaction to bursts of traffic, a topic that is explored further ahead.

4.3.2 Voice over IP

To further highlight the amount of different components that real-time traffic can have, let’s examine the Voice over IP (VoIP) realm and the most popular IP-based voice protocol, Session Initiation Protocol (SIP). SIP is widely used for controlling multimedia communication sessions such as voice and video calls over IP. It provides the signaling, and once a call is set up, the actual data (bearer) is carried by another protocol such as the RTP described previously.

In its most basic form, SIP is a context-based application very similar to HTTP, and it uses a request/response transaction model, where each transaction consists of a client request that invokes at least one response. Take, for example, Figure 4.15, where there is a soft phone registering with a server.

Image described by caption/surrounding text.

Figure 4.15 SIP register process, the classical 200 OK messages

And there is more happening here. For example, typically Session Description Protocol (SDP) will select the codec used at egress among other items negotiated in the SIP call, but Domain Name System (DNS) will be used to perform name resolution and Network Time Protocol (NTP) will be used to set the time, and it should be noted that there is no transmitting of any real data traffic yet.

Because very different types of traffic are associated with SIP, placing all SIP-“related” packets in the same traffic class of service and treating them with the same QOS in the network are neither possible nor recommended. Imagine a DNS query sharing the same class of service of voice packets. Again, real-time traffic is not just real-time traffic; there are the control and data components associated with it.

So what part of the voice session should be defined as VoIP and be handled as such? Should it only be the actual dataflow such as the UDP RTP packets on already established calls? Or should the setup packets in the SIP control plane also be included? Should the service involved with DNS, NTP, and so forth also be part of the same class of service? Or are there actually two services that should be separated into different classes of service, one for the control and one for the forwarding? The dilemma here is that when everything is classified as top priority, then nothing is “special” anymore.

Delay and jitter for data packets can cause a lot of harm. Typically each device has some form of jitter buffer to handle arrival variations, and a well-known recommendation is that its value should be less than 30 ms.

The delay value is a bigger issue, and an old recommendation in ITU-TG114 suggests a maximum of a 150-ms end-to-end delay to achieve good quality, which may sound a bit too conservative, but note that this is end-to-end delay; thus in a network with several possible congestion points, any one of them can easily ruin the whole service. As for packet loss, it is actually better to drop something than to try to reorder it, because packets can remain in a jitter/playback buffer only for a limited time. If a packet is delayed for too long, it is of no use to the destination, so it is a mistake to spend network resources processing it.

Control packets typically cannot be dropped, but some delay and jitter variation is acceptable, because this signaling part of VoIP rarely affects any interaction with the end user other than the possible frustration of not being able to establish the call “right now.”

The difference between control and data traffic is one of the catch-22 situations with VoIP. It is interesting to go back to basics and visit the PSTN realm and ask how did they do it. They calculate the service based on an estimate of the “average” load, from which they estimate the resources (time slots for bearer channels and out-of-band connections for the signaling). VoIP is not that different.

4.3.3 IPTV

Another highly popular application of real-time traffic is IPTV, the streaming of TV or video, which is essentially broadcasted over IP networks. Also note that streaming TV is often combined with voice in a triple-play package that delivers TV, telephony, and data to the end user subscriber, all over an IP network.

The focus here is strictly on QOS for IPTV delivery, and we will shy away from detailing the many digital TV formats (compressed and noncompressed, MPEG-2, and MPEG-4, SDTV and HDTV, and so forth).

The quality requirements for IPTV are very similar with those described earlier for VoIP, since both services need to have resources allocated for them and neither can be dropped. From the generation that grew up with analogue TV and the challenges of getting the antenna in the right position, there may be a little more tolerance for a limited number of dropped frames, whether this is a generic human acceptance or a generation gap is up to the reader. However, there are several fundamental differences between IPTV and VoIP. The most obvious one is packet size because IPTV packets are bigger and are subject to an additional burst, so IPTV requires larger buffers than VoIP. The delay requirement is also different, because the decoder box or node on the egress most often has a playback buffer that can be up to several seconds long, so jitter is generally not significant if a packet arrives within the acceptable time slot. Reordering of packets is possible, but it is not always trivial to do so, because each IPTV packet is large and the length of the playback buffer is translated into a maximum number of packets that can be stored, a number that is likely not to be large enough if a major reordering of packets is necessary. Other protocols involved in IPTV can be DHCP and HTTP or SSL/TLS traffic for images; however, these protocols for the most part do not visibly hurt the end user other than displaying a “Downloading… Please wait” screen message. The loss of frames seems worse to IPTV viewers than the dropping of words in a VoIP communication, but maybe because of the simple fact that most of the time, humans can ask each other to repeat the last sentence.

4.3.4 QOS and Real-Time Traffic

Real-time traffic requirements are often summarized as “minimum delay, minimum jitter, and no packet loss”; however, it always has two components, the control and the data. Whether it makes sense to map both of them in the same class of service depends on their requirements, but typically control traffic is more tolerant regarding delay and jitter. Also the control component of real-time traffic can contain several different protocols as highlighted in the VoIP call with the SIP.

Another key point is that real-time traffic presents itself in many different formats and typically with different requirements. The reasons why this book referred VoIP and IPTV is because they are popular and commonly used and secondly because although they seem similar at first glance, they have several significant differences such as their tolerance to jitter, which serves the purpose of illustrating that real-time traffic requirements depend on what is the traffic and how is being delivered.

Reference

  1. [1] Marrone, L.A., Barbieri, A. and Robles, M. (2013) TCP Performance—CUBIC, Vegas & Reno, Journal of Computer Science and Technology, Vol. 13. http://journal.info.unlp.edu.ar/journal/journal35/papers/JCST-Apr13-1.pdf (accessed August 19, 2015).

Further Reading

  1. Allman, M., Paxson, V. and Stevens, W. (1999) RFC 2581, TCP Congestion Control, April 1999. https://tools.ietf.org/rfc/rfc2581.txt (accessed August 21, 2015).
  2. Chuck, S. (2002) Supporting Differentiated Service Classes: TCP Congestion Control Mechanisms, Whitepaper, Juniper Networks. www.juniper.net (accessed September 8, 2015).
  3. DARPA (1981) RFC 793, Transmission Control Protocol—DARPA Internet Protocol Specification, September 1981. https://tools.ietf.org/rfc/rfc793.txt (accessed August 21, 2015).
  4. Handley, M., Jacobson, V. and Perkins, C. (2006) RFC 4566, SDP: Session Description Protocol, July 2006. https://www.ietf.org/rfc/rfc4566.txt (accessed August 21, 2015).
  5. IEEE, 802.1Qbb—Priority-based Flow Control. http://www.ieee802.org/1/pages/802.1bb.html (accessed August 21, 2015).
  6. IEEE, 802.1Qau—Congestion Notification. http://www.ieee802.org/1/pages/802.1au.html (accessed August 21, 2015).
  7. IEEE, 802.1Qaz—Enhanced Transmission Selection. http://www.ieee802.org/1/pages/802.1az.html (accessed August 21, 2015).
  8. Mathis, M., Mahdavi, J., Floyd, S. and Romanow, A. (1996) RFC 2018, TCP Selective Acknowledgment Options, October 1996. https://www.rfc-editor.org/rfc/rfc2018.txt (accessed August 21, 2015).
  9. Postel, J. (1980) RFC 768, User Datagram Protocol, August 1980. http://www.rfc-base.org/txt/rfc-768.txt (accessed August 21, 2015).
  10. Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M. and Schooler, E. (2002) RFC 3261, SIP: Session Initiation Protocol, June 2002. https://www.ietf.org/rfc/rfc3261.txt (accessed August 21, 2015).
  11. Schulzrinne, H. and Casner, S. (2003) RFC 3551, RTP Profile for Audio and Video Conferences with Minimal Control, July 2003. https://www.ietf.org/rfc/rfc3551.txt (accessed August 21, 2015).
  12. Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V. (2003) RFC 3550, RTP: A Transport Protocol for Real-Time Applications, July 2003. https://tools.ietf.org/rfc/rfc3550.txt (accessed August 21, 2015).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset