14. Combined CODEC and Network Parameters for an Enhanced Quality of Experience in Video Streaming

The shift toward IP-based infrastructure and the availability of more bandwidth on both the wire-line and the wireless and the satellite communication channels provide the capability to support video transmission over the Internet. As a result, today’s networks transport multimedia data with large bandwidth requirements. However, the challenges to transmit high quality video over the Internet with low latency still exist. For example, if many viewers tune in with their internet protocol television (IPTV) systems or their third generation wireless (3G) handhelds to follow a major event, the quality of the video may significantly deteriorate due to the overloaded networks, which will unavoidably result in the loss and/or the late arrival of video packets with latency higher than the maximum decoder play out deadline. One solution is that the video packets can be given a higher priority by using premium quality of service (QoS) in a differentiated services environment (Blake et al., 1998). However, ensuring this special level of QoS treatment is rather costly in terms of network management overhead inherent that is in TCP-based protocols. The current approach is to use a mix of prioritized packet transmission schemes and error features at the decoder (Shao and Chen, 2011).

The move toward IP convergence has certainly many advantages; however, there are challenges with transmitting video data over the Internet without specific QoS guarantees. As the objective of video transmission is to maintain a satisfactory quality of video and an increased quality of experience at the decoder (Zapater and Bressan, 2007; Piamrat et al., 2009; Wiegand et al., 2009), the compressed video is streamed over the Internet using the real-time transport protocol (RTP) over the connectionless user datagram protocol/internet protocol (UDP/IP) protocol, thus reducing the acknowledgments or retransmissions overheads (Belda et al., 2006). This type of video transmission is referred to as the conversational application (such as video telephony and video conferencing) where the acceptable delay is stringent. As a result of this short delay, in the case of network congestion, if a video packet exceeds the predefined time limit or play out deadline, the routers will discard the packet. Moreover, if the router buffer overflows, the router may discard the extra packets. The loss of packets may have adverse effect on the quality of the video at the decoder, especially if key video packets are discarded. In addition, as there is no flow control, the overall video quality can be improved by dynamically scheduling and prioritizing these key video packets in response to the varying traffic load of the network and its end-to-end (E2E) delay and loss probability (Kuo et al., 2011).

Hence, we are proposing the reordering of the packets in the queue of the ingress edge routers, so that those packets, whose presence at the decoder is absolutely crucial to maintaining adequate video quality, are transmitted first. The E2E distortion model we first presented in Jahaniaval and Fayek (2007) is based on error-resilience features in the H.264 codec: flexible macroblock ordering (FMO) and DP. Our model describes distortions as a result of inter- and intra-coding as well as the loss probability from the network. This mathematical distortion model is scalable for S video streams. This model is then used to minimize the overall E2E distortion of all live streams, hence producing a prioritized order of transmission (Jahaniaval, 2010; Jahaniaval et al., 2010). In this chapter, we will present our distortion model and the linear optimization we developed that minimizes the total distortion experienced by the set of active streams at any given time. Then, we will demonstrate that our implemented formulation results in higher video quality for the live streams.

The rest of this chapter is organized as follows. In Section 14.2, we highlight some of the research conducted in video streaming to date and give an overview of the resilience features in H.264/AVC. Section 14.3 describes our proposed E2E distortion model (Jahaniaval and Fayek, 2007) that takes into account the lossy nature of the transport network and embeds the corresponding distortion parameters in enhanced video traces. Section 14.4 describes our optimization model (Jahaniaval, 2010; Jahaniaval et al., 2010) that builds on the distortion model in Section 14.3. In Section 14.5, we validate our mathematical formulation by presenting the results obtained when we used our optimization algorithm to minimize the overall distortion of four video streams. Finally, we present our concluding remarks and future direction in Section 14.6.

14.2 Background and Related Research

Video compression involves the reduction of the data size contained in the raw video frames so that the video content can be transmitted and stored more efficiently. Although the current network bandwidth and the storage capacity have increased tremendously and are offered at a cheaper cost, video compression is still a definite necessity. For example, the amount of data required to store (or transmit) 1 s of a raw common intermediate format (CIF) is 4.35 MB of network or storage capacity, thus justifying the need for an encoder/decoder system. The main function of the enCOder/DECoder (CODEC) is to encode the video sequence and represent the video signal with lower bit representation in comparison to the raw (YUV) video and later decompress, and play the encoded video.

The Motion Picture Expert Group (MPEG) is a member body of the International Organization for Standards (ISO) that has developed standards for video and audio compression and processing. Starting with MPEG-1 standard, this body of the ISO has developed the high broadcasting quality for video and audio signal for transmission and storage by introducing the MPEG-2, which is primarily used in DVD video, digital video broadcasting along with the MPEG layer 3 audio coding (MP3). The MPEG-2 standard replaced the MPEG-1 standard used in video CDs (VCD) and VHS system.

The MPEG-2 standard introduced different profiles and levels of coding as well as improvements in coding efficiency (higher bit rate of 19.4 Mbps in comparison with the 1.5 Mbps of MPEG-1) and support for interlace and progressive video.

The standard has evolved to MPEG-4 and the H.264, which improves the video quality at a lower bit rate in comparison to the previous MPEG-2 standard.

H.264 (MPEG-4 Part 10, which is introduced in the next section) was designed to address higher coding efficiency as well as improved support for reliable transmission for applications such as videoconferencing, video telephony, and IPTV (Richardson, 2003). This is still the standard that is evolving to adapt to wireless networks video streaming applications.

14.2.1 H.264 Video Encoder/Decoder

The components of an H.264 video encoder/decoder are illustrated in Figure 14.1. As we can see, a group of pictures (GOP) is composed of an I-frame (spatially or intra-coded frame) followed by a number of predicted P frames (temporally or inter-coded frames). There also could be B frames or bidirectionally encoded frames. The input to the encoder is a raw video bit stream, usually in the YUV format. These streams undergo block-based compression, quantization, and encoding to produce a compressed bit stream that is then packaged into network abstraction layer (NAL) packets that are ready for transmission over digital data networks.

FIGURE 14.1
H264 CODEC components.

The I-prediction block takes advantage of the similar pixels within the frame (image), which demonstrate high spatial correlation or similarities. The compression is achieved by first de-correlating the image so that there are minimal interdependencies between the pixels. This step is referred to as the “intra-coding” in H.264 and involves the compression of the images by using transform coding, quantization, and entropy coding. The goal of the transform coding is to de-correlate the data so that most of the energy of the image is concentrated to smaller values. An example of this type of transform is the wavelet and discrete cosine transform (DCT), which is used in the H.264 CODEC (Richardson, 2003).

Typically, the video information is concentrated in a few low spatial frequency components, which allows for a more compact representation of the video frame. Once the DCT is applied to the image and the residual information, the resultant information (the transform coefficients) is quantized.

The quantization step size is of importance and a higher quantization step size would result in highly compressed video information (lower bit representation) at a cost of loss in quality. Unlike the DCT, this step is not lossless, and during reconstruction, the data at this stage cannot be fully recovered. The quantization step for a constant bit rate (CBR) video varies and adjusts to maintain the overall bit rate at a prescribed level. In variable bit rate (VBR) encoding, the quantization step remains constant throughout the encoding operation.

Once the data have been quantized, entropy encoding is used to represent the video sequence into a compressed bit stream. The bit code will contain markers and headers describing the synchronization period and image/sequence headers, respectively. The resultant of this step is the bit stream that can be transmitted or stored (Richardson, 2003).

14.2.2 H.264 Concealment and Error Correction Techniques

As video data are transported over the Internet and wireless networks, losses in video data and signal deterioration are expected. Error detection and concealment schemes attempt to minimize the distortion effect in video quality and prevent image artifacts during the decoding of the transmitted video. The error concealment in H.264 is classified into (1) intra-frame/intra-slice interpolation and (2) inter-frame/inter-slice concealment or interpolation.

In intra-slice interpolation, also referred to as spatial error concealment, the values of the missing pixels are estimated from the surrounding pixels of the same slice. The weighted pixel interpolation method is used to estimate each pixel from the pixel boundaries of the four adjacent healthy or concealed macroblocks (MBs) (Xu and Zhou, 2004; Xiang et al., 2009).

However, in the inter-slice concealment, also referred to as temporal error concealment, the missing data are calculated by using the available motion vectors (MVs). The damaged MB is replaced from the reference frame that had been referenced by the MV during the encoding process. In the case that the MVs are lost and not available, the damaged MB can be replaced by the MB in the reference frame that is located at the same spatial location (hence, the damaged MB is replaced by the collocated MB in the reference frame). Another alternative in the case of the unavailability of the MVs is the prediction of the lost data by taking the median of the surrounding MVs and replacing the MB from the reference frame using the predicted MV. Other methods of predictions are the boundary matching algorithm (BMA) and the absolute sum algorithm for MV prediction, which are further discussed in Agrafiotis et al. (2006), Kumar et al. (2006), and Xu and Zhou (2004).

14.2.3 Flexible Macroblock Ordering

The FMO is an error-resilience tool in H.264 that allows the assignment of MBs to different slice groups and divides the raw video frame into two or more slices to increase the probability of accurate concealment at the decoder (Stockhammer et al., 2003). As a result, encoding is only performed within the slice and the slice data are encoded independently from data in other slices within the same frame (Dhondt et al., 2006). The objective of FMO is to scatter possible errors to the entire frame to avoid error accumulation in a certain region. The MBs are assigned in different patterns to slice groups. In this work, we have chosen the dispersed (checkerboard) arrangement shown in Figure 14.2.

In this arrangement, each black and white colored MB is assigned to slice groups 1 and 2, respectively. The slices are then divided into DPs that are packaged into NAL Unit (NALU) packets. Each slice group is transmitted independently. If one slice is lost in transmission, each of its MBs can then be reconstructed (concealed) from its four neighboring MBs. In the case of temporal concealment when the MV is lost, the decoder will utilize the temporal replacement (TL) or BMA for concealment (Agrafiotis et al., 2006; Chen et al., 2006a,b). Simulation results in Wenger (2003) have demonstrated that with 10% channel loss probability, the visual distortion is unnoticeable and can only be spotted with a trained eye when using FMO.

FIGURE 14.2
Dispersed FMO of QCIF (176 × 144) frame. (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M. Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

14.2.4 Data Partitioning

DP classifies the encoded video data to three DPs with unequal importance, thereby providing unequal error protection during transmission. In the usual case of encoding with H.264 (without the use of DP), each MB in a slice is encoded in a single bit stream. On the other hand, DP divides the encoded data into three partitions per slice (instead of one) based on the importance of the encoded data (Figure 14.2). The video header, MVs, and quantization parameters (QPs) are encoded and classified as data partition A (DP_A). The intra-coefficients and inter-coefficients are classified as DP_B and DP_C, respectively. The DP_A has the highest priority since DP_B and DP_C will become useless if the DP_A is lost. In the case when both the DP_B and the DP_C are lost, the MVs and the header can conceal the lost MB. For other loss scenarios, Table 14.1 outlines the concealment strategies adopted in H.264.

14.2.5 Current Related Research

The emergence of the fourth generation wireless networks and digital TV poses great challenges in video transport. Two of these prominent challenges are (1) providing and maintaining visually acceptable video quality given variations in network conditions and (2) the accurate measurement of this quality objectively by the networking researchers who use the data in the video trace files, which closely mimic the video encoding statistics to measure the quality after lossy transmission (Ke et al., 2007). Wireless networks complicate this task by the time-varying and location-dependent channel characteristics thereby increasing the probability of packet loss and hence affecting the received video quality (Seeling and Reisslein, 2005a,b). The most commonly used objective metric in video traces is the PSNR. There have been extensive research efforts to improve both the subjective and the objective quality at the receiver by utilizing techniques such as scalable video coding (SVC) (Seeling and Reisslein, 2005a,b; Ke et al., 2007; Seeling et al., 2007; Wang et al., 2007). In SVC, the video is encoded as a base and one or more enhancement layer(s). Decoding only the base layer represents the minimum visually acceptable quality. Adding the enhancement layer to the base layer improves the quality further. One of the most widely used SVC in wireless transmission is the fine grain scalability (FGS) coding. In FGS coding, the base layer is encoded at a coarser quality in comparison with the original video. The enhancement layer is encoded as bit-planes, which can be discarded at the Bit-Plane granularity level. Also, further improvement in quality is possible with the advent of the H.264 video codec that achieves efficient encoding at lower bit rates (Stockhammer et al., 2003; Ksentini et al., 2006). This chapter focuses on encoding mechanisms applied to the base layer only.

TABLE 14.1

Decoder Concealment Based on the Arrival of Data Partitions

Availability	Concealment
DP_A, DP_B, and DP_C	Full recovery of video data
DP_A	Concealment using the header and MV from surrounding MBs
DP_A and DP_B	intra-concealment or concealment using MV from DP_A and from intra-residual from DP_B
DP_A and DP_C	inter-concealment or concealment using MV from DP_A and from inter-residual from DP_C

Source: Data adapted from Kumar, S. et al., Error resiliency schemes in H.264/AVC standard, J. Visual Commun. Image Represent., 17(2):425–450, 2006; Jahaniaval, A. and Fayek, D., Combined data partitioning and FMO in distortion modeling for video trace generation with lossy network parameters, in IEEE International Symposium on Signal Processing and Information Technology, ISSPIT’07, pp. 972–976, 2007.

There has been extensive research in the area of rate-distortion (R-D) optimization and rate control including but not limited to the research in Dai et al. (2006), Kim and Kim (2002), Kim et al. (2003); Maani et al. (2008), Mansour et al. (2011), Xiong et al. (2005), and Zhang et al. (2006).

In Zhang et al. (2006), the authors have proposed a R-D model to predict the overall decoder distortion of the video frames. The concept of their model is to optimally select the encoding mode so as to reduce the distortion at the decoder. The optimal mode selection takes into account the decoder distortion including the propagation distortion when multiple prediction sources are selected. The encoder then changes the encoding mode in terms of MB to ensure lower distortion at the decoder. Their work did not include multiple streams nor error-resilience features. In addition, it uses flow control and packet retransmission mechanisms that are bandwidth expensive for multiple concurrent streams.

In Maani et al. (2008), the channel model is combined with the R-D model based on slice level encoding. The channel information is utilized by the scheduler to prioritize the packets for transmission. Their proposed method is based on the packetization of each encoded slice, without the consideration of FMO or DP error-resilience features. The GOP distance is set to 16 frames that reduce the error propagation effect (Maani et al., 2008), and this is the GOP size used in our work.

14.3 E2E Distortion Model

Video traces characterize video encoding in a simple text format by including information such as the video frame number, the frame size, the time, and the YUV-PSNR values of individual video frames. The traces are widely used in networking research since video size, copyright issues, and video encoding equipment are the typical problems associated when experimenting with real video data (Seeling and Reisslein, 2005a,b). However, experimentation with actual video bit streams is necessary to obtain more accurate objective and subjective quality rating after lossy transmission (Seeling et al., 2004). In a video trace, if part or all of the video frame is lost, the entire frame is considered to be dropped and therefore the PSNR of that frame is considered as null (Seeling et al., 2007). This method does not take into account the partial frame data losses nor the concealment functions at the decoder and hence portrays high variation in video quality assessment. In addition, current traces do not provide enough parameters for experimental and fine-tuning of network protocols during the design and simulation phases.

In order to enable a more precise assessment, we proposed the addition of extra information based on the distortion model that we developed (Jahaniaval and Fayek, 2007). In this distortion model, we combined FMO with DP so that each slice group is partitioned with DP_A, DP_B, and DP_C as shown in Figure 14.2. The additional parameters reflect the expected distortion as a function of the current network performance, namely, current E2E packet loss probability modeled in Section 14.3.1. These parameters are then included in the trace file to equip network designers with more information when designing and assessing priority-based protocols that aim to minimize the overall distortion of received video. In this work, we classify the distortion for intra- and inter-mode coding (in Sections 14.3.2 and 14.3.3, respectively) and we measure the E2E distortion in terms of the mean square error (MSE) of the original MB and the reconstructed MB at the encoder. Section 14.3.4 presents the overall distortion introduced in a video stream due to the encoding and network conditions. The distortion parameters identified there are then used in our proposed enriched video trace format presented in Section 14.3.5.

14.3.1 E2E Loss Probability

Let Pb_A,i denote the E2E mean loss probability for packets belonging to DP_A of slice i; we evaluate it using the steady-state distribution of a Gilbert extended model (Liang and Liang, 2007). We characterize the state transition diagram depicted in Figure 14.3 at the receiver’s end. The primary state, state-(0), represents a successful packet reception. State-(1) denotes no reception (or one packet loss) at the following time unit. State-(2) is reached if still no DP packet is received after two time units since the last successful reception, and so on up to state-(m). We define the time unit to be the amount of time of the reception of at most one packet. Let state-(q) denote a typical state where $1 \leq q \leq m$ $1 \leq q \leq m$ at which q consecutive time slots have elapsed since the last successful reception.

FIGURE 14.3
Extended Gilbert state diagram for data partition transmission. (From Jahaniaval, A. and Fayek, D., Combined data partitioning and FMO in distortion modeling for video trace generation with lossy network parameters, in IEEE International Symposium on Signal Processing and Information Technology, ISSPIT’07, pp. 972–976, 2007. With permission.)

Let:

X:	Random variable describing the number of lost packets
$\bar{s} (n) :$ $\bar{s} (n) :$	Probability distribution vector at time step n
s_q:	Steady state probability of state q
P_(i)(j):	Transition probability from state j at (n – 1) to state i at (n)

The transition from state $\bar{s} (n - 1)$ $\bar{s} (n - 1)$ to $\bar{s} (n)$ $\bar{s} (n)$ at discrete time-step n occurs according to Equation 14.1:

$\bar{s} (n) = A \bar{s} (n - 1)$ $\bar{s} (n) = A \bar{s} (n - 1)$

(14.1)

where A is the state transition matrix,

$A = [\begin{matrix} P_{00} & P_{01} & P_{02} & \dots & P_{(0) (m - 1)} & 1 \\ P_{10} & 0 & 0 & \dots & \dots & 0 \\ 00 & P_{21} & 0 & \dots & \dots & 0 \\ 0 & 0 & P_{32} & \dots & \dots & 0 \\ . & . & . & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & P_{(m) (m - 1)} & 0 \end{matrix}]$ $A = [\begin{matrix} P_{00} & P_{01} & P_{02} & \dots & P_{(0) (m - 1)} & 1 \\ P_{10} & 0 & 0 & \dots & \dots & 0 \\ 00 & P_{21} & 0 & \dots & \dots & 0 \\ 0 & 0 & P_{32} & \dots & \dots & 0 \\ . & . & . & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & P_{(m) (m - 1)} & 0 \end{matrix}]$

(14.2)

For the state transition matrix A in Equation 14.2, we adopted the notation used in Gebali (2008), where the probability transition matrix is the transpose of the one defined in most literature. Since A is a column-stochastic matrix, the steady state solution of Equation 14.1 will be provided by the convergent state vector $\bar{s} (n)$ $\bar{s} (n)$ as $n \to \infty$ $n \to \infty$ (Gebali, 2008):

$\bar{s} = A \bar{s}$ $\bar{s} = A \bar{s}$

(14.3)

Since we are interested in the E2E mean loss probability Pb_A,i for packets belonging to DP_A of slice i, we evaluate it using the steady-state distribution as follows:

$P b_{A, i} = \sum_{q = 1}^{m} q \cdot s_{q} = 1 - s_{0}$ $P b_{A, i} = \sum_{q = 1}^{m} q \cdot s_{q} = 1 - s_{0}$

(14.4)

In a similar way, we can compute the respective mean loss probability Pb_B,i and Pb_C,i of DP_B and DP_C, respectively. However, we would like to explore the modeling of the latter quantities conditionally based on the successful reception of the DP_A packets during a time period T that is less than the play out deadline and above which the DP_A is considered completely lost $(P b_{A, i} = 1)$ $(P b_{A, i} = 1)$ . This will pose an upper bound on $m : m = ⌊ p l a y o u t / T ⌋$ $m : m = ⌊ p l a y o u t / T ⌋$ . Otherwise, for an intra-coded slice (I-slice), the DP_A packets are received with Pb_A,i = 0.

14.3.2 Intra-Mode E2E Distortion Cases

The proposed E2E distortion model is comprised of the H.264 encoder characteristics as well as the channel loss probability and delay information in order to select the most optimum order of packet deployment from the queue of the ingress router. This model combines the FMO and the DP (Zhang et al., 2006; Jahaniaval and Fayek, 2007) so that each slice group is divided to DP and further packetized for transmission. The model mathematically describes the distortion of the slices at the decoder based on the availability and the reception of the packets carrying the slice information. The model also combines the loss probability (presented in Section 14.3.1) as additional parameters so as to quantify the distortion as a function of the current network performance as we will show in the following discussion.

As an example, if an I-Slice (which is usually packetized and transmitted as DP_A and DP_B packets) is available at the decoder, then the reconstruction of the slice information will be fully recoverable with the exception of the information lost during quantization. However, if due to an increased loss probability or delay beyond the decoder’s play out deadline, the packet containing DP_B is lost, the video header information in the DP_A packet will be sufficient to conceal the slice. On the other hand, if an inter-coded slice, which consists of DP_A and DP_C packets, lacks the DP_C information at the decoder, then the motion vectors and quantization information carried in the DP_A packet will be sufficient to locate the lost information in the reference frame and replace the missing MB.

In this model, the distortion is classified based on inter- and intra-mode encoding and the E2E distortion is measured in terms of the MSE of the original MB from the raw (YUV) video and the reconstructed MB at the decoder. Hence, in this case, the encoding quantization distortion is also considered in the model.

Let X_i,j be the MB belonging to slice i and frame j where i = 1, 2 indicating the slice number and $j = 1, 2, \dots, J$ $j = 1, 2, \dots, J$ indicating the frame number in the GOP. Let also $X_{i, j}^{Q}$ $X_{i, j}^{Q}$ and $X_{i, j}^{C o n c}$ $X_{i, j}^{C o n c}$ represent the reconstructed MB at the encoder and the decoder, respectively. The reconstructed MBs at the decoder $(X_{i, j}^{C o n c})$ $(X_{i, j}^{C o n c})$ contain both the quantization distortion and the distortion as a result of the MB concealment due to the possible loss of video data during transmission. As the FMO is integrated in our model, $(P b_{A, i}) (1 - P b_{A, k})$ $(P b_{A, i}) (1 - P b_{A, k})$ denotes the probability of a lost or corrupted MB, which is reconstructed according to the information from its four neighboring MBs in slice k, where $k \neq i$ $k \neq i$ .

The intra-mode distortion model describes the possible distortion scenarios related to the intra-coded MBs. As the availability of both DP_A and DP_B contributes to the full reconstruction of the intra-MBs, the presence of these DPs at the receiver is essential to decode a visually acceptable video quality (Maani et al., 2008). In the first two cases, the DP_A must be received timely at the decoder, so we set $P b_{A, i} = 0$ $P b_{A, i} = 0$ in Equations 14.5 and 14.6.

Case 1: If DP_A and DP_B are successfully received within the play out deadline at the decoder and are not corrupted, then the distortion is only due to quantization:

$D (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{A, i}) \cdot (1 - P b_{B, i})$ $D (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{A, i}) \cdot (1 - P b_{B, i})$

(14.5)

The successful reception of DP_A and DP_B is of high importance, not only for ensuring the acceptable visual quality for the decoded MB but rather to be used as an accurate reference for the reconstruction of other inter-coded MBs.

Case 2: If DP_A is received correctly but DP_B is corrupted or lost, the spatial concealment that takes place results in the following distortion:

$D (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (1 - P b_{A, i}) \cdot (P b_{B, i})$ $D (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (1 - P b_{A, i}) \cdot (P b_{B, i})$

(14.6)

The total distortion for the intra-coded MBs in slice i and frame j is obtained by adding up the two previous equations and resulting in the following:

$D {(i, j)}_{I n t r a} = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{B, i}) + E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{B, i})$ $D {(i, j)}_{I n t r a} = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{B, i}) + E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{B, i})$

(14.7)

14.3.3 Inter-Mode E2E Distortion Cases

The inter-mode distortion cases explain the possible scenarios due to the loss and availability of inter-coded information. The inter-coded information exists in DP_A and DP_C, which contain the motion vectors and inter-coded residuals, respectively. Following the same notation in Section 14.3.2, we define the following inter-coding distortion cases:

Case 1: If DP_A and DP_C are received at the decoder and have not been corrupted nor delayed:

$D (i, j) = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \cdot (1 - P b_{A, i}) \cdot (1 - P b_{C, i})$ $D (i, j) = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \cdot (1 - P b_{A, i}) \cdot (1 - P b_{C, i})$

(14.8)

In Equation 14.8, the residual contained in $D P_{C} (r e s_{i, j}^{Q})$ $D P_{C} (r e s_{i, j}^{Q})$ is added to the location of the reference MB in the previous frame $X_{i, j - 1}^{Q}$ $X_{i, j - 1}^{Q}$ and the resultant is used to calculate the MSE in comparison to the MB in the raw video X_i,j.

Case 2: DP_A is received successfully at the decoder, however, DP_C is lost:

$D (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{A, i}) \cdot (P b_{C, i})$ $D (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (1 - P b_{A, i}) \cdot (P b_{C, i})$

(14.9)

In this scenario, the decoder will use the MV in DP_A to identify the MB in the reference frame (previous frame in which the motion prediction was made). We need to note here that our model assumes that the motion prediction and compensation is accomplished using a reference point with a distance of one frame, hence the notion of $X_{i, j - 1}^{Q}$ $X_{i, j - 1}^{Q}$ and not $X_{i, j - 2}^{Q}$ $X_{i, j - 2}^{Q}$ . Equation 14.9 represents the quantization distortion of the MB located in the previous frame as the decoder uses the available motion vector in DP_A to simply just replace the current lost MB with the collocated one from the previous frame.

Case 3: Both DP_A and DP_C are lost or corrupted:

$D (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{A, i}) \cdot (P b_{C, i}) \cdot (1 - P b_{A, k}) where k \neq i$ $D (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{A, i}) \cdot (P b_{C, i}) \cdot (1 - P b_{A, k}) where k \neq i$

(14.10)

This specific scenario is permitted as a result of the FMO arrangement; a lost MB can be concealed from its neighboring MBs belonging to a different slice. As discussed earlier, the BMA algorithm can be used to estimate the lost MB. Alternatively, it is more common practice that the lost slice will be estimated and concealed using the MV information from the surrounding MBs belonging to the second FMO slice.

In Equation 14.10, the MB is concealed from the colocated MB in the previous frame or concealed from the surrounding MBs provided that the MB belonging to the second slice is not lost with probability $(1 - P b_{A, k})$ $(1 - P b_{A, k})$ .

Again, for the realization of these scenarios, $P b_{A, i} = 0$ $P b_{A, i} = 0$ in Equations 14.8 and 14.9, whereas $P b_{A, i} = 1$ $P b_{A, i} = 1$ in Equation 14.10. As a result, the total inter-distortion is the sum of the quantization distortion and concealment distortion in slice i belonging to frame j given in Equations 14.8 through 14.10:

$D {(i, j)}_{I n t e r} = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \cdot (1 - P b_{C, i}) + E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (P b_{C, i}) + E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{C, i}) \cdot (1 - P b_{A, k}) where k \neq i$ $D {(i, j)}_{I n t e r} = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \cdot (1 - P b_{C, i}) + E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \cdot (P b_{C, i}) + E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \cdot (P b_{C, i}) \cdot (1 - P b_{A, k}) where k \neq i$

(14.11)

14.3.4 Parametric Distortion Model

The total distortion $D {(i, j)}_{t o t a l}$ $D {(i, j)}_{t o t a l}$ of slice i in frame j is therefore given by the sum of Equations 14.7 and 14.11:

$D {(i, j)}_{t o t a l} = D {(i, j)}_{I n t r a} + D {(i, j)}_{I n t e r}$ $D {(i, j)}_{t o t a l} = D {(i, j)}_{I n t r a} + D {(i, j)}_{I n t e r}$

By examining Equations 14.7 and 14.11, we can formulate the total distortion as a function of the E2E loss probabilities (Equation 14.4) as follows:

$D {(i, j)}_{t o t a l} = K_{1} (i, j) \cdot (1 - P b_{B, i}) + K_{2} (i, j) \cdot (P b_{B, i}) + K_{3} (i, j) \cdot (1 - P b_{C, i}) + K_{4} (i, j) \cdot (P b_{C, i}) + K_{5} (i, j) \cdot (P b_{C, i}) \cdot (1 - P b_{A, k})$ $D {(i, j)}_{t o t a l} = K_{1} (i, j) \cdot (1 - P b_{B, i}) + K_{2} (i, j) \cdot (P b_{B, i}) + K_{3} (i, j) \cdot (1 - P b_{C, i}) + K_{4} (i, j) \cdot (P b_{C, i}) + K_{5} (i, j) \cdot (P b_{C, i}) \cdot (1 - P b_{A, k})$

(14.12)

where $k \neq i$ $k \neq i$ and the distortion coefficients (the K parameters) are as follows:

$\begin{array}{l} K_{1} (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \\ K_{2} (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \\ K_{3} (i, j) = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \\ K_{4} (i, j) = E [{(X_{i, j} - X_{i, j - 1}^{Q})}^{2}] \\ K_{5} (i, j) = E [{(X_{i, j} - X_{i, j - 1}^{Q})}^{2}] \end{array}$ $\begin{array}{l} K_{1} (i, j) = E [{(X_{i, j} - X_{i, j}^{Q})}^{2}] \\ K_{2} (i, j) = E [{(X_{i, j} - X_{i, j}^{C o n c})}^{2}] \\ K_{3} (i, j) = E {{[X_{i, j} - (r e s_{i, j}^{Q} + X_{i, j - 1}^{Q})]}^{2}} \\ K_{4} (i, j) = E [{(X_{i, j} - X_{i, j - 1}^{Q})}^{2}] \\ K_{5} (i, j) = E [{(X_{i, j} - X_{i, j - 1}^{Q})}^{2}] \end{array}$

(14.13)

TABLE 14.2

Standard Verbose FGS Base Layer Trace File

Source: Data from Seeling, P. et al., Video Traces for Network Performance Evaluation, Springer, Heidelberg, Germany, 2007; Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, pp. 511–516, 2010.

14.3.5 Enriched Trace File Format

The video traces are an abstraction of the actual encoded video which consists of video information such as video frames, data size, and PSNR values. The standard verbose trace format for the FGS encoder would not portray an accurate quality measurement when used in conjunction with the optimizer described in Section 14.4. In fact when using a standard verbose file, if a frame is partially or completely lost, the entire frame is considered as lost. As a result, the quality of the decoder concealment of a lost MB is not taken into consideration. The commonly used verbose trace format for FGS base layer is illustrated in Table 14.2 (Seeling et al., 2007; Seeling and Reisslein, 2010).

As previously described, the data in DP_A has higher precedence in comparison to the data in DP_B and DP_C, and hence the network will assign a higher priority to DP_A packets. Each slice within a frame will then contain three priority packet types. The proposed trace file format explores a higher level of granularity by including details at the DP level per slice. Table 14.3 shows the enriched trace file format^* we used in our work.

We embed the K parameters (Equation 14.13 and Table 14.3) in each packet header. These parameters are used by the optimizer to estimate the overall distortion experienced by the active flows as explained in the following section.

14.4 Optimization Model

In this section, we describe our distortion optimization formulation that utilizes the level of granularity described in the previous section. In network simulations, the a priori knowledge of the distortion model in Equation 14.12 and its distortion coefficients (Equation 14.13) enables the design of network protocols that make informative resource allocation so as to maximize the quality of the decoded video at the receivers.

TABLE 14.3

Proposed Verbose Trace Format for FGS Base Layer

Source: Data from Jahaniaval, A. and Fayek, D., Combined data partitioning and FMO in distortion modeling for video trace generation with lossy network parameters, in IEEE International Symposium on Signal Processing and Information Technology, ISSPIT–07, pp. 972–976, 2007; Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, pp. 511–516, 2010.

According to Equation 14.12, the overall distortion of all the streams is calculated based on the number of distortion parameters and the probability of loss for the packets belonging to each stream. The order of transmission plays an important role since the chance of distortion will increase as more data are sent. The optimization outcome is a reordering of the streams so that the overall distortion of all active streams is minimized. In other words, the optimizer selects the stream whose packets have a highest chance of loss or delay in comparison to all the other existing streams to be transmitted first, followed by the stream that will experience the second highest and the third highest, and so on. Without optimization, the stream with the highest distortion is anticipated to experience a lower received quality that degrades further over time if no corrective measure is taken.

Our optimization model is described mathematically as follows. Let d_s denote the distortion for stream s, then for all s ∈ S,

$d_{s} = \sum_{j}^{J} \sum_{i}^{N} (K_{1} (i, j, s) \cdot (1 - P b_{B, i, s}) + K_{2} (i, j, s) \cdot (P b_{B, i, s}) + K_{3} (i, j, s) \cdot (1 - P b_{C, i, s}) + K_{4} (i, j, s) \cdot (P b_{C, i, s}) + K_{5} (i, j, s) \cdot (P b_{C, i, s}) \cdot (1 - P b_{A, k, s})) \forall s \in S$ $d_{s} = \sum_{j}^{J} \sum_{i}^{N} (K_{1} (i, j, s) \cdot (1 - P b_{B, i, s}) + K_{2} (i, j, s) \cdot (P b_{B, i, s}) + K_{3} (i, j, s) \cdot (1 - P b_{C, i, s}) + K_{4} (i, j, s) \cdot (P b_{C, i, s}) + K_{5} (i, j, s) \cdot (P b_{C, i, s}) \cdot (1 - P b_{A, k, s})) \forall s \in S$

(14.14)

where

k ≠ i

N is the number of slices per frame (N = 2 in our implementation)

J is the number of frames per traffic stream

K(i,j,s) is the distortion associated with the ith slice in the jth frame in the sth stream as given in Equations 14.12 and 14.13

We define an ordering factor (ODF) ζ_s for stream s. The increased chance of distortion is represented by multiplying ζ_s by the stream distortion value and we select the ODF value ζ_s to be increasing exponentially as a function of the order by which the stream s is sent. In a sense, ζ_s represents the added distortion factor caused by ordering the streams.

Now, let us define the total distortion D_Total incurred by all streams taking into account their transmission ordering as follows:

$D_{T o t a l} = [d_{1} \dots d_{s}] [\begin{matrix} b_{11} & \dots & b_{1 S} \\ ⋮ & ⋱ & ⋮ \\ b_{S 1} & \dots & b_{S S} \end{matrix}] [\begin{matrix} ζ_{1} \\ ⋮ \\ ζ_{S} \end{matrix}]$ $D_{T o t a l} = [d_{1} \dots d_{s}] [\begin{matrix} b_{11} & \dots & b_{1 S} \\ ⋮ & ⋱ & ⋮ \\ b_{S 1} & \dots & b_{S S} \end{matrix}] [\begin{matrix} ζ_{1} \\ ⋮ \\ ζ_{S} \end{matrix}]$

(14.15)

Expanding the matrix notation in Equation 14.15, we have

$D_{T o t a l} = \begin{matrix} ζ_{1} (d_{1} b_{11} + d_{2} b_{21} + \dots + d_{S} b_{S 1}) \\ + ζ_{2} (d_{1} b_{12} + d_{2} b_{22} + \dots + d_{S} b_{S 2}) \\ ⋮ \\ + ζ_{S} (d_{1} b_{1 S} + d_{2} b_{2 S} + \dots + d_{S} b_{S S}) \end{matrix}$ $D_{T o t a l} = \begin{matrix} ζ_{1} (d_{1} b_{11} + d_{2} b_{21} + \dots + d_{S} b_{S 1}) \\ + ζ_{2} (d_{1} b_{12} + d_{2} b_{22} + \dots + d_{S} b_{S 2}) \\ ⋮ \\ + ζ_{S} (d_{1} b_{1 S} + d_{2} b_{2 S} + \dots + d_{S} b_{S S}) \end{matrix}$

(14.15a)

The binary coefficient matrix B is utilized to solve the minimization of the total distortion in Equation 14.15 where d_s, s ∈ S, is defined by Equation 14.14 and the binary values b_ss, s ∈ S, are constrained such that the sum of each row and each column in matrix B equals 1. Now we define our optimization problem as

$\begin{array}{l} M i n i m i z e D_{T o t a l} \\ S u b j e c t t o : \\ 1. b_{r c} = {0, 1} \forall r, c \in S \\ 2. \sum_{r}^{S} b_{r c} = 1 \forall c \in S \\ 3. \sum_{r}^{S} b_{r c} = 1 \forall r \in S \\ 4. ζ_{s} = {(s + 1)}^{s} \end{array}$ $\begin{array}{l} M i n i m i z e D_{T o t a l} \\ S u b j e c t t o : \\ 1. b_{r c} = {0, 1} \forall r, c \in S \\ 2. \sum_{r}^{S} b_{r c} = 1 \forall c \in S \\ 3. \sum_{r}^{S} b_{r c} = 1 \forall r \in S \\ 4. ζ_{s} = {(s + 1)}^{s} \end{array}$

(14.16)

Constraint 4 shows that the ODF for stream s increases exponentially with its transmission order.

For each stream s, the optimization algorithm extracts the information carried by all the packets belonging to the stream and using the K parameters for each DP belong to stream s, the distortion incurred by the stream is evaluated using the current network conditions in terms of the E2E loss probability. Thus, the values of the loss probabilities for each stream are continuously measured during simulation and are fed back to the optimizer in the ingress router.

During the simulation, the loss probability for each stream is calculated at its corresponding receiver using the weighted moving average (WMA) method (Ross, 2009). Hence, for the nth measurement with window size equal to m (where m = [playout/T])

$W M A_{n} = \frac{n \cdot L P_{m} + (n - 1) \cdot L P_{m - 1} + (n - 2) \cdot L P_{m - n + 2} + \dots + L P_{m - n + 1}}{n + (n - 1) + (n - 2) + \dots + 2 + 1}$ $W M A_{n} = \frac{n \cdot L P_{m} + (n - 1) \cdot L P_{m - 1} + (n - 2) \cdot L P_{m - n + 2} + \dots + L P_{m - n + 1}}{n + (n - 1) + (n - 2) + \dots + 2 + 1}$

(14.17)

Recalling that the ODF ζ_s multiplies the distortion value of each stream and the optimization binary variables are restricted to be equal to either 0 or 1, now by solving the system, the optimization results in the current live streams reordering that achieves the minimum overall distortion D_Total (Equation 14.15).

14.5 Implementation, Analysis, and Results

In this section, we describe the experimental and simulation environment we developed to validate the mathematical modeling presented in the previous sections. In Section 14.5.1, we explain how the video traces were generated before feeding the encoded bit streams into a network topology explained in Section 14.5.2. In Section 14.5.3, we present the analytical results of our work.

14.5.1 Video Traffic and Trace Generation

To produce the enhanced video trace files (Table 14.3), the H.264/AVC JM 12.0 (Sühring, 2008) encoder was modified to encode four video sequences: Akiyo, Carphone, Foreman, and Silent (Seeling and Reisslein, 2010). Table 14.4 shows the properties of these four video sequences. The selected videos are commonly used in the video research domain as they portray a high degree of temporal and spatial variation, high and low motion sequences (Ekmekci et al., 2006), and different texture and motion characteristics (Ruolin, 2008).

To calculate the K parameters for each slice at the encoder, the MSEs in Equation 14.13 are evaluated by computing the difference between the collocated pixels in the raw YUV FMO slice and its corresponding encoded–decoded slice. To evaluate K₅, the DP_A packets belonging to slice 1 in frame j are removed, then frame j is decoded using the DP_A packets of its other slice, slice 2. The same process is repeated when calculating K₅ for slice 2, but now the DP_A packets of slice 1 are used.

TABLE 14.4

Video Sequences Used in Simulation

Video Sequence	Akiyo	Foreman	Carphone	Silent
Frame dimension	QCIF^a	QCIF	QCIF	QCIF
Number of frames	298	396	381	298
Sampling	4:2:0	4:2:0	4:2:0	4:2:0
Avg. packet size (kB)	5.13	12.9	10.9	8.74
Max packet size (kB)	45	42	51	64
Avg. rate (kbps)	307.55	774.54	654.2	524.8
Max rate (kbps)	2693	2530	3083	3821

Source: Data adapted from Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.

^a 176 × 144.

To reduce the error propagation, a 16-frame GOP IPPP structure was used. Bidirectional predicted frames (or the B-frames) were not used (Seeling et al., 2007) as it will increase the chance of error propagation. Each video sequence was encoded at a rate of 30 frames per second and a constant QP and hence the output of the encoder is a stream of VBR video data as opposed to a CBR. The QP has been set to 20 with a range between 0 (lossless) and 51 (heavily distorted, as a result of reduced bit rate). The average PSNR value was set at approximately 65 dB. This PSNR value produces a high quality video so that any loss in video information can be detected by visual comparison due to the high contrast between the high quality (received FMO slices) and low quality (concealment) areas, when the video is decoded, which we experimentally observed during the trace generation phase. Statistically, the averages were calculated for each video sequence and are summarized in Table 14.4.

14.5.2 Network Simulation

In this work, the OMNeT++ network simulator (OMNeT++, 2010) was used to validate the optimization model. OMNeT++ is a discrete-event simulator, which provides an object-oriented with an open-architecture network simulation environment. The optimization algorithm was implemented using CPLEX™ (CPLEX, 2010), which is an optimization library originally developed by ILOG and now is part of the product suite of IBM™. The C++ version of CPLEX was used within the network simulator.

Figure 14.4 displays the network topology we designed and used to carry the video traffic. Five traffic sources stream video and non-video data to five destinations through the lossy Internet cloud represented by routers 1 through 7 in Figure 14.4. The ingress and egress Edge routers interface the sources and destinations to the Internet cloud, respectively. The traffic used in our simulations from source to destination is as follows:

FIGURE 14.4
Network topology. (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

Source	Receiver	Video Sequence
Server 1	Client 1	Stream 1: Akiyo
Server 2	Client 2	Stream 2: Carphone
Server 3	Client 3	Stream 3: Foreman
Host 1	Client 4	Stream 4: Silent
Host 2	Client 5	Stream 5: Non-video network traffic

The implementation of the optimization model occurs in the ingress router. Based on the information provided to the optimizer, namely the K distortion values (Equation 14.13) and the calculation of the loss probability, the optimizer will schedule the deployment of the packets so as to minimize the predicted distortion at the receivers for all the streams. Figure 14.5 shows the edge router architecture that we used in our simulations, in which four video traffic sources are incident on an ingress router that has one queue for each DP type.

The rearrangement of the packets is first applied on the DP_A queues, then on queues with DP_B and DP_C packets, respectively. This rearrangement will give priority to the packets of the streams that are experiencing higher deterioration at the current evaluation period in order to ensure that the decoded quality is not jeopardized as a result of the overall latency and loss probability. As will be demonstrated in the following section, the optimizer is triggered under different conditions and at different GOP intervals.

After reordering, the packets sequence belonging to the same stream is preserved. It has been shown (Maani et al., 2008) that, as in the event of a packet loss, the concealment will be more effective if the packets are ordered according to their original sequence. Figure 14.6 demonstrates the reordering in the DP_A queue.

FIGURE 14.5
(See color insert.) DP queues in the ingress router. (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, pp. 511–516, 2010.)

FIGURE 14.6
Order of video packets in DP_A. (a) Before re-ordering and (b) after reordering. (Adapted from Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

The proposed model does not take into consideration the effect of error propagation as it assumes that the GOP is no more than 16 frames, hence the error is only confined to the 16 frames until the next I-slice is received at the beginning of each GOP. The scheduler at the ingress router compares the current loss probability and the current delay to the decoder’s delay and loss probability thresholds for each stream. In the event that the current delay and loss probability exceed their respective thresholds, the optimizer is activated. In our implementation, we did not account for any lag between the measurements of the average loss probability and the delay per stream, which is calculated at each end receiver, and the measurements availability at the ingress router’s optimization algorithm before each invocation.

With regard to the transport of the video packets, the RTP/UDP/IP protocols are used to transport the packets in a connectionless fashion where no acknowledgment is required due to the stringent delay constraints and to reduce the control overhead. The K distortion parameters are packetized along with the DPs and are available for the optimizer’s processing and analysis. The overall functionality is outlined in Figure 14.7.

The maximum threshold proposed in Pinson et al. (2007) is 2% for E2E loss probability and 1 ms of maximum E2E delay. Since we used a combination of FMO and DP error-resilience features, we utilized a higher value of 2.5% for the loss probability threshold. This value has been obtained as we experimentally observed that dropping packets over 2.5% loss probability reveals noticeable visual artifacts. Similarly, an observable deterioration happens with a 1.18 ms delay with the encoding setup we have used.

FIGURE 14.7
Operation sequence in the ingress router. (Modified and adapted from Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M. Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

We experimented with the optimizer invocation under different conditions. Four different invocation patterns are summarized in Table 14.5.

In the baseline case, the packets arrival will follow a FIFO fashion in the queues of the ingress router. The processing delay added by the optimizer (due to its quadratic complexity) has no adverse effect on the E2E delay of the four video streams. But we suspect that it will introduce an unforgivable delay for a large number of streams. Our ongoing work is focusing on this area. In addition to the rearrangement of the packets, the receivers record the arrival of all the packets in a raw trace format similar to the original trace file, but with the addition of the arrival time-stamp information. This trace file is further processed to calculate the PSNR values of the received video slices for each stream.

TABLE 14.5

Optimizer Invocation Patterns

Baseline	The Optimizer is not Invoked
Optimizer #1	The optimizer is invoked when the delay and/or the loss probability thresholds are exceeded (1.18 ms and 2.5%, respectively) for any of the stream. After the rearrangement of the packets in the queues, the DP_A packets are deployed until no packets remain in the queue, followed by deployments of all the packets in the DP_B queue, and finally the deployment of all the packets in the DP_C queue
Optimizer #2	Similar to Optimizer #1, except that the reordering outcome is applied only to DP_A queue whose packets are deployed first until no packets remain in it, followed by the packets in the DP_B queue, and finally the packets in the DP_C queue
Optimizer #3	Similar to Optimizer #2, the optimization occurs when the thresholds are exceeded, however, with an allowable activation frequency of every five GOP.
Optimizer #4	Similar to Optimizer #3, but the activation frequency is 10 GOP instead of 5 GOP.

Source: Data adapted from Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M Sc thesis, University of Guelph, Guelph, Ontario, Canada, 2010.

14.5.3 Results

In this section, we present the results obtained when we used the enriched traces based on our E2E distortion model presented in Section 14.3. During the real-simulation time, the E2E loss probability and delay were monitored for each video stream. We then invoked the optimizer based on our model in Section 14.4.

In Figure 14.8, we present the results of our comparative simulation runs for Stream 1. When no optimization is performed (Baseline in Figure 14.8a), we can observe the undesirable high fluctuations in PSNR values that are reset to a high value at the beginning of each GOP (with the I-frame), degrading as the transmission of the GOP progresses. The other graphs show the respective results for Optimizers #1 through #4. With each I-frame transmission, the network metrics, namely, the current E2E loss probability and delay, are input to the optimizer, which then produces the best reordering of the live streams that minimizes the total distortion (Equation 14.15). As expected, the optimizer introduced a significant amount of control overhead in the network, giving preference to Optimizer #2 whose frequency of invocation is based on threshold violation; this can be observed in Figure 14.8b. Optimizers #3 and #4 still perform better than the baseline (used as a reference run) but at the expense of lesser improvement in the PSNR values than those achieved by #2. Even though, Optimizer #1 has the same frequency of invocation as #2, its queue management causes the DP_A packets arriving during optimization to be delayed until all pre-optimization packets have been deployed from the three queues in the ingress router. This handling of the queue has a minor negative impact on the PSNR values as shown in Figure 14.8b.

FIGURE 14.8
(See color insert.) Received PSNR for Stream 1. (a) Received PSNR for Stream 1 (Akiyo). (b) Reordered with ascending PSNR value. (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, pp. 511–516, 2010.)

In comparison with the baseline simulation, in the latter cases (with optimization), we can observe a smoothing effect on the PSNR values that eliminates much of its fluctuations with an overall higher value throughout the total simulation time. To provide a better visualization, we sorted the results in ascending order of the PSNR values, as opposed to following the simulation time line.

The discussion just presented for Stream 1 is equally applicable to the other three video streams. Figure 14.9 shows the sorted PSNR results for video streams 2, 3, and 4. Overall, the variation in the PSNR values is greatly reduced and maintained in an acceptable range when the optimization is activated in the ingress router. Table 14.6 summarizes the ranges of PSNR values averaged on the video streams, indicating that the main conclusion is that the optimization improves the PSNR values at the receivers.

FIGURE 14.9
(See color insert.) Received PSNR values in ascending order for (a) Stream 2 (Carphone), (b) Stream 3 (Foreman), (c) Stream 4 (Silent). (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

TABLE 14.6

PSNR Comparison: Optimizers versus Baseline

TABLE 14.7

GOP Comparison Count for Optimizers versus Baseline

	#1	#2	#3	#4
Higher	49	64	52	54
Lower	31	16	28	26
Same	9	9	9	9

In Table 14.7, we report the number of GOP which averaged higher, lower, and the same PSNR values as in the baseline run for all four video streams.

Expanding on the averages reported in Table 14.6, Figure 14.10 gives a detailed view of the Gaussian (Normal) distribution and cumulative distribution of the PSNR values for each of the optimizers and for the baseline. From Figure 14.10, we can observe that

• The baseline (no optimization) has the worst performance among all simulations.

• In all optimization cases, we obtain significantly higher video quality than in the no optimization case.

• The best ranking in performance goes to Optimizer #2, followed by #1, then #4, and finally #3. This is observed by narrower (lower standard deviation) and higher average PSNR values in the Gaussian distributions of these curves.

• The cumulative distribution curves also support this analysis.

FIGURE 14.10
Probability distribution of PSNR values. (Modified and adapted from Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M. Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.)

FIGURE 14.11
Optimization effect on (a) loss probability and (b) delay. (From Jahaniaval, A., Video quality enhancement through end-to-end distortion optimization and enriched video traces, M. Sc. thesis, University of Guelph, Guelph, Ontario, Canada, pp. 511–516, 2010.)

In general, we found that we obtain better results when the optimization is invoked on a threshold violation basis than when it is invoked on a constant GOP interval basis.

Moreover, up to this point, our approach indicates that there is naturally a trade-off between the frequency of the optimizer invocation and the overhead placed on the network’s control plane.

Figure 14.11 displays the effect of each of the optimizers on the video streams in terms of (a) loss probability and (b) delay.

14.6 Conclusion

As the demand for multimedia services has increased over the recent years, the challenge still remains in delivering high quality multimedia over the Internet in a competitive manner to the conventional technologies such as cable and satellite TV. The work presented in this chapter delivers some bridging research aspects between the video compression domain and the video delivery over lossy networks.

In this work, we presented a new mathematical model for the E2E distortion of video streams that takes into account QoS specific parameters, namely, the E2E loss probability (Jahaniaval and Fayek, 2007). Two error-resilience features of H.264 (DP and FMO) have been used to accurately model the slice expected distortion of the base layer as a function of E2E packet loss probability. Then we proposed an augmentation for the standard verbose file by creating representative video quality parameters that are associated with the loss probability to enable a more accurate assessment of distortion at run time.

Based on this model, we devised a linear optimization model (Jahaniaval, 2010; Jahaniaval et al., 2010) that makes use of the enhanced trace file format and incorporates the measurements of the loss probability at real simulation time to quantify the distortion of each video stream and take corrective actions to overcome the degradation in video quality. The objective of the optimization algorithm was to minimize the overall distortion of the live video streams at the moment when the optimizer is invoked. The algorithm is implemented in the ingress routers.

In order to validate our modeling, we ran several simulation tests with and without optimization. Several modes of optimization invocation were used to reach a trade-off where the optimization overhead does not negatively impact video quality but rather improves it for the live streams. Our experiments indicated that running the optimization on a constant interval basis is outperformed by the threshold violation mode of invocation in terms of overall PSNR improvement and network overhead. The optimization ordering outcome was applied to the DP packets.

Our findings so far have indicated that the optimization significantly improves the video quality at the receivers with an acceptable small cost of increased loss probability and delay due to the overhead introduced in the network’s control plane. The statistical analysis conducted on the data samples from the simulation runs also demonstrates improvement in the received quality in comparison to the baseline run.

So far we successfully concluded that it is beneficial to use our optimization model to improve the video quality at the receivers when only the PSNR for each slice is used. However, we would also like to explore other objective video quality metrics in addition to PSNR, such as the structure similarity index metric (Wang et al., 2004), the hybrid image quality metric (Engelke et al., 2006), and others. Moreover, we will investigate our approach when applied to both the base and enhancement layers in FGS coding.

References

Agrafiotis D., D. R. Bull, and C. Nishan. Enhanced error concealment with mode selection. IEEE Transactions on Circuits and Systems for Video Technology, 16(8):960–973, 2006.

Belda A., J. C. Guerri, and A. Pajares. Adaptive error resilience tools for improving the quality of MPEG-4 video streams over wireless channels. In IEEE 32nd EUROMICRO Conference on Software Engineering and Advanced Applications, SEAA’06, Cavtat, Dubrovnik, pp.424–429, 2006.

Blake S., D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss. An architecture for differentiated services. Technical report, RFC 2475, 1998.

Chen Y., O. Au, C. Ho, and J. Zhou. Spatio-temporal boundary matching algorithm for temporal error concealment. In IEEE International Symposium on Circuits and Systems, ISCAS’06, Island of Kos, Greece, vol.29, p.4, 2006.

Chen Y., K. Xie, F. Zhang, P. Pandit, and J. Boyce. Frame loss error concealment for SVC. Journal of Zhejiang University - Science A, 7(5):677–683, 2006.

CPLEX, licensed to D. Fayek, SoE, University of Guelph, IBM ILOG CPLEX optimizer. http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/, accessed from January 2008–August 2010.

Dai M., D. Loguinov, and H. M. Radha. Rate-distortion analysis and quality control in scalable Internet streaming. IEEE Transactions on Multimedia, 8(6):1135–1146, 2006.

Dhondt Y., P. Lambert, and R. Van de Walle. A flexible macroblock scheme for unequal error protection. In IEEE International Conference on Image Processing, Atlanta, GA, vol. 8, pp. 829–832, 2006.

Ekmekci S., T. Sikora, and P. Frossard. Unbalanced quantized multi-state video coding. EURASIP Journal on Applied Signal Processing, 2006:1–10, 2006.

Engelke U., T.-M. Kusuma, and H.-J. Zepernick. Perceptual quality assessment of wireless video applications. In International ITG-Conference on Source and Channel Coding, Munich, Germany, pp. 1–6, 2006.

Gebali F. Analysis of Computer and Communication Networks. Springer, Heidelberg, Germany, 2008.

Jahaniaval A. Video quality enhancement through end-to-end distortion optimization and enriched video traces, M.Sc. thesis, University of Guelph, Guelph, Ontario, Canada, 2010.

Jahaniaval A. and D. Fayek. Combined data partitioning and FMO in distortion modeling for video trace generation with lossy network parameters. In IEEE International Symposium on Signal Processing and Information Technology, ISSPIT’07, Cairo, Egypt, pp. 972–976, 2007.

Jahaniaval A., D. Fayek, and R. J. Brown. Distortion optimization in enriched video traces for end-to-end video quality enhancement. In IEEE/IFIP 6th International Conference on Networks and Services Management, CNSM’10, Niagara Falls, Canada, pp.511–516, 2010.

Ke C.-H., C.-K. Shieh, W.-S. Hwang, and A. Ziviani. Improving video transmission on the internet. IEEE Potential, 26(1):16–19, 2007.

Kim I.-M. and H.-M. Kim. A new resource allocation scheme based on a PSNR criterion for wireless video transmission to stationary receivers over Gaussian channels. IEEE Transactions on Wireless Communications, 1(3):393–401, 2002.

Kim I.-M. and H.-M. Kim. An optimum power management scheme for wireless video service in CDMA systems. IEEE Transactions on Wireless Communication, 2(1):81–91, 2003.

Ksentini A., M. Naimi, and A. Gueroui. Toward an improvement of H.264 video transmission over IEEE 802.11e through a cross-layer architecture. IEEE Communications Magazine, 44(1):107–114, 2006.

Kumar S., L. Xu, M. K. Mandal, and S. Panchanathan. Error resiliency schemes in H.264/AVC standard. Journal of Visual Communication and Image Representation, 17(2):425–450, 2006.

Kuo W.-H., W. Liao, and T. Liu. Adaptive resource allocation for layer-encoded IPTV multicasting in IEEE 802.16 WiMAX wireless networks. IEEE Transactions on Multimedia, 13(1):116–124, 2011

Liang G. and B. Liang. Balancing interruption frequency and buffering penalties in VBR video streaming. In 26th IEEE International Conference on Computer Communications INFOCOM, Anchorage, AK, pp.1406–1414, 2007.

Maani E., P.V. Pahalawatta, R. Berry, T.N. Pappas, and A. K. Katsaggelos. Resource allocation for downlink multiuser video transmission over wireless lossy networks. IEEE Transactions on Image Processing, 17(9):1663–1671, 2008.

Mansour H., P. Nasiopoulos, and V. Krishnamurthy. Rate and distortion modeling of CGS coded scalable video content. IEEE Transactions on Multimedia, 13(2):165–180, April 2011.

OMNeT++. OMNeT++ network simulation framework, http://www.omnetpp.org/, accessed August 2010.

Piamrat K., C. Viho, J.-M. Bonnin, and A. Ksentini. Quality of Experience measurements for video streaming over wireless networks. In Third International Conference on Information Technology: New Generations, pp.1184–1189, 2009.

Pinson M. H., S. Wolf, and R. B. Stafford. Video performance requirements for tactical video applications. In IEEE Conference on Technologies for Homeland Security, Woburn, MA, pp. 85–90, 2007.

Richardson I. E. G. H.264 and MPEG-4 Video Compression. John Wiley & Sons, New York, 2003.

Ross S. Probability and Statistics for Engineers and Scientists, 4th edn. Elsevier Academic Press, Amsterdam, the Netherlands, pp. 567–570, 2009.

Ruolin R. A novel intra refreshment algorithm for ROI. In International Conference on Multimedia and Information Technology. IEEE Computer Society, Washington, DC, pp. 62–65, 2008.

Seeling P., F. Fitzek, and M. Reisslein. Video Traces for Network Performance Evaluation. Springer, Heidelberg, Germany, 2007.

Seeling P. and M. Reisslein. Evaluating multimedia networking mechanisms using video traces. IEEE Potential, 24(4):21–25, 2005a.

Seeling P. and M. Reisslein. Video coding with multiple descriptors and spatial scalability for devices diversity in wireless multi-hop networks. In IEEE Consumer Communications and Networking Conference, CCNC’05, Las Vegas, NV, pp. 278–283, 2005b.

Seeling P. and M. Reisslein. Video trace library. Arizona State University, http://trace.eas.asu.edu/yuv/index.html, accessed in 2010.

Seeling P., M. Reisslein, and B. Kulapala. Network performance evaluation using frame size and quality traces of single-layer and two-layer video: A tutorial. IEEE Communications Surveys and Tutorials, 6(3):58–78, 2004.

Shao S.-C. and J.-H. Chen. A novel error concealment approach based on general regression neural network. In International Conference on Consumer Electronics, Communications and Networks, CEC-Net 2011, Xianning, China, pp.4679–4682, 2011.

Stockhammer T., M. M. Hannuksela, and T. Wiegand. H.264/AVC in wireless environments. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):657–673, 2003.

Sühring K. H.264/AVC software JM 12.0, http://iphome.hhi.de/suehring/tml/, accessed in 2008.

Wang Y.-K., M. M. Hannuksela, S. Pateux, A. Eleftheriadis, and S. Wenger. System and transport interface of SVC. IEEE Transactions on Circuits and Systems for Video Technology, 17(9):1149–1163, 2007.

Wang Z., L. Lu, and A. C. Bovik. Video quality assessment based on structural distortion measurement. Elsevier Signal Processing: Image Communication, 19(2):121–132, 2004.

Wenger S. H.264/AVC over IP. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):645–656, 2003.

Wiegand T., L. Noblet, and F. Rovati. Scalable video coding for IPTV services. IEEE Transactions on Broadcasting, 55(2):527–538, 2009.

Xiang X., Y. Zhang, D. Zhao, S. Ma, and W. Gao. A high efficient error concealment scheme based on auto-regressive model for video coding. In 27th Conference on Picture Coding Symposium, Chicago, IL, pp. 305–308, 2009.

Xiong H., J. Sun, S. Yu, J. Zhou, and C. Chen. Rate control for real-time video network transmission on end-to-end rate-distortion and application-oriented QoS. IEEE Transactions on Broadcasting, 51(1):122–132, 2005.

Xu Y. and Y. Zhou. H.264 video communication based refined error concealment schemes. IEEE Transactions on Consumer Electronics, 50(4):1135–1141, 2004.

Zapater M.-N. and G. Bressan. A proposed approach for quality of experience assurance for IPTV. In First International Conference on the Digital Society, Gaudeloup, French Caribbean, pp. 25–30, 2007.

Zhang Y., W. Gao, and D. Zhao. Joint data partition and rate-distortion optimized mode selection for H.264 error-resilient coding. In IEEE International Workshop on Multimedia Signal Processing, MMSP’06, Victoria, British Columbia, Canada, pp.248–251, 2006.

* So far we focused only on enriching of the FGS Base-Layer trace file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Combined CODEC and Network Parameters for an Enhanced Quality of Experience in Video Streaming

Create new playlist

Sign In

Sign Up

Table of Contents for
14. Combined CODEC and Network Parameters for an Enhanced Quality of Experience in Video Streaming