1. Network-Aware Error-Resilient Video Coding

With the growing demand for universal accessibility to video content, more and more different networks are being used to deploy video services. However, in order to make these video services efficiently available with an acceptable quality in error-prone environments, such as mobile networks, appropriate error resilience techniques are necessary Since these error-prone environments can typically have very different characteristics, which can also vary over time, it is important that the considered error resilience techniques are network-aware and can adapt to the varying characteristics of the used networks.

In order to extend the useful lifetime of a video coding standard, standardization bodies usually specify the minimum set of tools that are essential for guaranteeing interoperability between devices or applications of different manufacturers. With this strategy, the standard may evolve continuously through the development and improvement of its nonnormative parts. Error resilience is an example of a video coding tool that is not completely specified in a normative way, in any of the currently available and emerging video coding standards. The reason for this is that it is simply not necessary for interoperability and, therefore, it is one of the main degrees of freedom to improve the performance of standard-based systems, even after the standard has been finalized. Nevertheless, recognizing the paramount importance of this type of tool, standardization initiatives always include a minimum set of error-resilient hooks (e.g., in the form of bitstream syntax elements) in order to facilitate the development of effective error resilience techniques, as needed for the particular application envisaged.

Error-resilience techniques are usually seen as playing a role at the decoder side of the communication chain. However, by using preventive error resilience techniques at the encoder side, which involve the intelligent design of the encoder, it is also possible to make the task of the decoder much easier in terms of dealing with errors. In fact, the performance of the decoder can greatly vary depending on the amount of error resilience help provided in the bitstream generated by the encoder. This way, at the encoder, the challenge is to develop techniques that make video bitstreams more resilient to errors, in order to allow the decoder to better recover in case errors occur; these techniques may be called preventive error resilience techniques. At the decoder, the challenge is to develop techniques that make it possible for the decoder to take all the available received data (correct and, eventually, corrupted) and decode it with the best possible video quality, thus minimizing the negative subjective impact of the errors on the video quality offered to the user; these techniques may be called corrective error resilience techniques.

Video communication systems, in order to be globally more error-resilient to channel errors, typically include both preventive and corrective error-resilient techniques. An important class of preventive techniques is error-resilient source coding, which consists of providing redundancy at the source coding level in order to prevent error propagation and consequently reduce the distortion caused by data corruption/loss. Error-resilient source coding techniques include data partitioning, resynchronization and reversible variable length codes [1,2], redundant coding schemes, such as sending the same information predicted from different references [3], scalable video coding [4,5,6], or multiple description coding [7,8]. Besides source coding redundancy, channel coding redundancy can also be used, where a good example is the case of forward error correction [9]. In terms of corrective error-resilient techniques, error concealment techniques correspond to one of the most important classes, but other important techniques also exist, such as error detection and error localization techniques [10]. Error concealment techniques consist essentially of postprocessing methods aiming at recovering missing or corrupted data from neighboring data (either spatially or temporally) [11], but for these techniques to be truly effective, an error detection technique should be first used to detect if an error has indeed occurred, followed by an error localization technique to determine where the error occurred and which parts of the video content were affected [10]. For a good review of the many different preventive and corrective error-resilient video coding techniques that have been proposed in the literature, the reader can refer to Refs. [12,13].

This chapter addresses the problem of error-resilient encoding, in particular of how to efficiently improve the resilience of compressed video bitstreams, while adaptively considering the network characteristics in terms of information loss.

Video coding systems that rely on predictive (inter) coding to remove temporal redundancy, such as those based on the H.264/AVC standard [14], are strongly affected by transmission errors/information loss due to the error propagation caused by the prediction mechanisms. Therefore, typical approaches to make bitstreams generated by the encoder more error-resilient rely on the adaptation of the video coding mode decisions, at various levels (e.g., picture, slice, or macroblock level), to the underlying network characteristics, trying to establish an adequate trade-off between predictive and non-predictive encoding modes. This is done because nonpredictive modes are less efficient in terms of compression but can provide higher error resilience. In this context, controlling the amount of nonpredictive versus predictive encoded data is an efficient and highly scalable error resilience tool.

The intracoding refresh schemes available in the literature [2,15,16,17,18,19,20,21,22] are a typical example of efficient error resilience techniques to improve the video quality over error-prone environments without requiring changes to the bitstream syntax, thus allowing to continuously improve the performance of standard video codecs without compromising interoperability. However, a permanently open issue related to these techniques is how to achieve the best trade-off between error resilience and coding efficiency.

Since these schemes work by selectively coding in intra mode different parts of the video content at different time instants, they are able to avoid long-term propagation of transmission or storage errors that could make the decoded quality decay very rapidly. This way, these intracoding refresh schemes are able to significantly improve the error resilience of the coded bitstreams and increase the overall subjective impact of the decoded video. While some schemes do not require any specific knowledge of what is being done at the decoder in terms of error concealment [16,17,18], other approaches try to estimate the distortion experienced at the decoder given a certain probability of data corruption/loss and the concealment techniques adopted [2,22].

The problem with most video coding mode decision approaches, including typical intracoding refresh schemes, is that they can significantly decrease the coding efficiency if they make their decisions without taking into account the rate-distortion (RD) cost of such decisions. This problem can be dealt with by combining the error-resilient coding mode decisions with the video encoder rate control module [23], where the usual coding mode decisions are taken [24,25]. This way, coding-efficient error robustness can be achieved. In the specific case of intracoding refresh schemes, a clever solution for this combination, is to compare the RD cost of coding macroblocks (MBs) in intra and inter modes; if the cost of intracoding is only slightly larger than the cost of intercoding, then the coding mode could be changed to intra, providing error robustness almost for free. This strategy is able to reduce error propagation and, thus, to increase error robustness when transmission errors occur, at a very limited RD cost increase and without the huge complexity of estimating the expected distortion experienced at the decoder.

Nevertheless, in order for these error-resilient video coding mode decision schemes to be really useful in an adaptive way, the current error characteristics of the underlying network being used for transmission should be taken into account. For example, in the case of intracoding refresh schemes, this will allow the bit rate resources allocated to intracoding refresh to be adequately adapted to the error characteristics of the network [26]. After all, networks with small amounts of channel errors only need small amounts of intracoding refresh and vice versa. Thus, efficient bit rate allocation in an error-resilient way has to depend on the feedback received from the network about its current error characteristics, which define the error robustness needed.

Therefore, network awareness makes it possible to dynamically vary the amount of error resilience resources to better suit the current state of the network and, therefore, further improve the decoded video quality without reducing the error robustness [26,27]. This problem is nowadays more relevant than ever, since more and more audiovisual content is accessed over error-prone networks, such as mobile networks, and these networks can have extremely varying error characteristics (over time).

As an illustrative insightful example, this chapter presents a fully automatic network-aware MB intracoding refresh technique for error-resilient H.264/AVC video coding, which also dynamically adjusts the amount of cyclically intra refreshed MBs according to the network conditions, guaranteeing that endless error propagation is avoided.

The rest of the chapter is organized as follows. Section 1.2 describes the general video coding framework that was used for implementing the considered error-resilient network-aware MB intracoding refresh scheme. Section 1.3 introduces the concept of efficient intracoding refresh, which will later be needed in Section 1.4, where the considered network-aware intracoding refresh scheme itself is described. Section 1.5 presents some relevant performance results for the considered scheme in typical mobile network conditions and, finally, Section 1.6 concludes the chapter.

1.2 Video Coding Framework

The network-aware error-resilient scheme described in this chapter relies on the rate control scheme proposed by Li et al. [24,28], as well as on the RD optimization (RDO) framework and the random intra refresh technique included in the H.264/AVC reference software [25]. Since the main contributions and novelty of network-aware error-resilient scheme described in this chapter regard the latter two techniques, it is useful to first briefly review the RDO and the random intra refresh techniques included in the H.264/AVC reference software in order for the reader to better understand the described solutions.

1.2.1 Rate-Distortion Optimization

The H.264/AVC video coding standard owes its major performance gains, relatively to previous standards, essentially to the many different intra and inter MB coding modes supported by the video coding syntax. Although not all modes are allowed in every H.264/AVC profile [14], even for the simplest profiles, such as the Baseline Profile, the encoder has a plethora of possibilities to encode each MB, which makes it difficult to accomplish optimal MB coding mode decisions with low (encoding) complexity. Besides the MB coding mode decision, for motion-compensated inter coded MBs, finding the optimal motion vectors and MB partitions is also not a straightforward task. In this context, RDO becomes a powerful tool, allowing the encoder to optimally select the best MB coding modes and motion vectors (if applicable) [28,29].

In the H.264/AVC reference software [25], the best MB mode decision is accomplished through the RDO technique, where the best MB mode is selected by minimizing the following Lagrangian cost function:

$J_{M O D E} = D (M O D E, Q P) + λ_{M O D E} \times R (M O D E, Q P)$ $J_{M O D E} = D (M O D E, Q P) + λ_{M O D E} \times R (M O D E, Q P)$

(1.1)

where

MODE is one of the allowable MB coding modes (e.g., SKIP, INTER 16 × 16, INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTRA 4 × 4, INTRA 16 × 16)

QP is the quantization parameter

D(MODE, QP) and R(MODE,QP) are, respectively, the distortion (between the original and the reconstructed MB) and the number of bits that will be achieved by applying the corresponding MODE and QP

In Ref. [28], it is recommended that, for intra (I) and inter predicted (P) slices, λ_MODE be computed as follows:

$λ_{M O D E} = 0.85 \times 2^{(Q P - 12) / 3}$ $λ_{M O D E} = 0.85 \times 2^{(Q P - 12) / 3}$

(1.2)

Motion estimation can also be accomplished through the same framework. In this case, the best motion vector and reference frame can be selected by minimizing the following Lagrangian cost function:

$J_{M O T I O N} = D (m v (R E E)) + λ_{M O T I O N} \times R (m v (R E F))$ $J_{M O T I O N} = D (m v (R E E)) + λ_{M O T I O N} \times R (m v (R E F))$

(1.3)

where

mv(REF) is the motion vector for the frame reference REF

D(mv(REF)) is the residual error measure, such as the sum of absolute differences (SAD) between the original and the reference

R(mv(REF)) is the number of bits necessary to encode the corresponding motion vector (i.e., the motion vector difference between the selected motion vector and its prediction) and to signal the selected reference frame

In a similar way, Ref. [28] also recommends that, for P-slices, λ_MOTION be computed as

$J_{M O T I O N} = \sqrt{λ_{M O D E}}$ $J_{M O T I O N} = \sqrt{λ_{M O D E}}$

(1.4)

when the SAD measure is used.

Since the quantization parameter is required for computing the Lagrangian multipliers λ_MODE and λ_MOT1ON, as well as for computing the number of bits to encode the residue for a given MB, a rate control mechanism must be used that can efficiently compute for each MB (or set of MBs, such as a slice) an adequate quantization parameter in order to maximize the decoded video quality for a given bit rate budget. In this case, the method proposed by Li et al. [24,28] has been used since it is the one implemented in the H.264/AVC reference software [25].

1.2.2 Random Intra Refresh

As mentioned earlier, the H.264/AVC reference software [25] includes a (nonnormative) technique for intra refreshing MBs. Although this technique is called random intra refresh (RIR), it is not really a purely random refresh technique. This technique is basically a cyclic intra refresh (CIR) technique for which the refresh order is not simply the raster scan order. The refresh order is randomly defined once before encoding, but afterward intra refresh proceeds cyclically, following the determined order, with n MBs for each time instant. An example of a randomly determined intra refresh order, for QCIF spatial resolution, may be seen in Figure 1.1.

FIGURE 1.1
Example of random intra refresh order for QCIF spatial resolution. (From Nunes, P. et al., Error resilient macroblock rate control for H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, p. 2133, October 2008. With permission. © 2008 IEEE.)

Since the RIR technique used in the H.264/AVC reference software and also considered here is basically a CIR technique, in the remainder of this chapter, the acronyms RIR and CIR will be used interchangeably.

One of the main advantages of this technique is that, being cyclic, it guarantees that all MBs will be refreshed, at least, once in each cycle, thus guaranteeing that there are no MBs where errors can propagate indefinitely. However, this technique also has disadvantages, one of which is the fact that all MBs are refreshed exactly the same number of times. This basically means that it is not possible to refresh more often MBs that are more likely to be lost or are harder to conceal at the decoder if an error does occur.

Another important aspect of this technique is that MBs are refreshed according to the predetermined order, without taking into account the eventual RD cost of intra refreshing a given MB, as opposed to letting the rate control module decide which encoding mode is best in terms of RD cost. This is exactly where there is room for improvement: Intra refresh should be performed by taking into account the RD cost of a given MB.

1.3 Efficient Intracoding Refresh

When deciding the best MB coding mode, notably between inter- and intracoding modes, the RDO framework, as briefly described in Section 1.2.1, simply selects the mode that has lower RD cost, given by Equation 1.1. This RDO framework, as implemented in the H.264/AVC reference software, does not take into account other dimensions, besides rate and distortion optimization, such as the robustness of the bitstream in error-prone environments. Therefore, some MBs are simply inter coded because their best inter mode RD cost is slightly lower than the best intra mode RD cost. For these cases, selecting the intra mode, although not optimal in a strict RD sense, can prove to be a much better decision when the bitstream becomes corrupted by errors (e.g., due to packet losses in packet networks), and the intra coded MBs can be used to stop error propagation due to the (temporal) predictive coding modes. Moreover, if additional error robustness is introduced through an intra refresh technique, for example, as the one described in Section 1.2.2, some MBs can be highly penalized in a RD sense, since they can be blindly forced to be encoded in an intra mode, without taking into account the RD cost of that decision.

1.3.1 Error-Resilient RDO-Driven Intra Refresh

The main idea of a network-aware error-resilient scheme is to perform RDO in a resilient manner, using the relative RD cost of the best intra mode and the best inter mode for each MB. Therefore, whenever coding a given MB in intra mode does not cost significantly more than the best intercoding mode, the given MB is gracefully forced to be encoded in its best intra mode.

This error-resilient RDO provides an efficient intra refresh scheme, thus guaranteeing that the generated bitstream will be more robust to channel errors, without having to spend a lot of bits on intra coded MBs, which typically reduces the decoded video quality when there are no errors in the channel. This scheme can be described through the MB-level mode decision architecture depicted in Figure 1.2.

FIGURE 1.2
Architecture of the error-resilient MB intra/inter mode decision scheme. From Nunes, P. et al., Error resilient macroblock rate control for H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, p. 2134, October 2008. With permission. © 2008 IEEE.)

1.3.1.1 RDO Intra and Inter Mode Decision

Before deciding the best mode to encode a given MB, the best inter mode RD cost, J_INTER, is computed from the set of all possible inter modes, and the best intra mode RD cost, J_INTRA, is computed from the set of all possible intra modes through RDO, i.e., Equations 1.1 and 1.3, where

$J_{I N T E R} = \min_{M O D E \in S_{I N T E R}} (J_{M O D E})$ $J_{I N T E R} = \min_{M O D E \in S_{I N T E R}} (J_{M O D E})$

(1.5)

and

$J_{I N T R A} = \min_{M O D E \in S_{I N T R A}} (J_{M O D E})$ $J_{I N T R A} = \min_{M O D E \in S_{I N T R A}} (J_{M O D E})$

(1.6)

where

S_INTER is the set of allowed inter modes (i.e., SKIP, INTER 16 × 16, INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, and INTER 4 × 4)

S_INTRA is the set of allowed intra modes (i.e., INTRA 4 × 4, INTRA 16 × 16, INTRA PCM, or INTRA 8 × 8)

The best intra and inter modes are the ones with the lowest intra and inter RD costs, respectively.

1.3.1.2 Error-Resilient Intra/Inter Mode Decision

To control the amount of MBs that will be gracefully forced to be encoded in intra mode, a control parameter, α_RD (which basically specifies the tolerable RD cost increase for replacing an inter by an intra MB) is used in such a way that

$if (J_{I N T R A} / J_{I N T E R} \leq α_{R D}) Intra mode is selected$ $if (J_{I N T R A} / J_{I N T E R} \leq α_{R D}) Intra mode is selected$

(1.7)

$if (J_{I N T R A} / J_{I N T E R} > α_{R D}) Intra mode is selected$ $if (J_{I N T R A} / J_{I N T E R} > α_{R D}) Intra mode is selected$

(1.8)

Notice that, for α_RD = 1, no particular mode is favored in an RD sense, while for α_RD > 1, the intra modes are favored relatively to the inter modes (see Figure 1.3). Therefore, the amount of gracefully forced intra encoded MBs can be controlled by the α_RD parameter. The MBs that end up being forced to intra mode are the MBs for which the RD costs of intra and inter modes are similar, which typically correspond to MBs that have high inter RD cost and, therefore, would be difficult to conceal at the decoder if lost.

1.3.2 Random Intra Refresh

Notice that the previous scheme does not guarantee that all MBs are periodically refreshed, which, if not properly handled, could lead to an endless propagation of errors along time for some MBs in the video sequence. To handle this issue, an RIR can also be concurrently applied, but with a lower number of refreshed MBs per frame when compared with solely applying the RIR technique, in order not to compromise dramatically the RD efficiency.

FIGURE 1.3
MBs with an intra/inter RD cost ratio below the line will be gracefully forced to intra mode. (From Nunes, P. et al., Error resilient macroblock rate control for H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, p. 2134, October 2008. With permission. © 2008 IEEE.)

1.4 Network-Aware Error-Resilient Video Coding Method

The main limitation of the MB coding mode decision method described in Section 1.3 is that the control parameter, α_RD, is not dynamically adapted to the actual network error conditions. However, when feedback about the network error conditions is available, it would be possible to use this information to adjust the α_RD control parameter in order to maximize the decoded video quality while dynamically providing adequate error resilience.

1.4.1 Intra/Inter Mode Decision with Constant α_RD

When a constant α_RD value is used without considering the current network error conditions in terms of packet loss rate (PLR), the benefits of the technique described in Section 1.3 (and proposed in Ref. [23]) are not fully exploited. This is clear from Figure 1.4, where the Foreman sequence has been encoded with the Baseline Profile of H.264/AVC with different α_RD values, including α_RD = 1. In Figure 1.4, as well as in the remainder of Section 1.4, CIR is not used in order to avoid biasing the behavior associated with the α_RD parameter. Notice, however, that the use of CIR is typically recommended, as mentioned in Section 1.2.2. As can be seen, in these conditions, the optimal α_RD (i.e., the one that leads to the highest PSNR) is highly dependent on the network PLR.

FIGURE 1.4
PSNR versus PLR for a constant α_RD parameter for the Foreman sequence. (From Soares, L. D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

As expected, when there are no errors (PLR = 0%), the highest decoding quality is achieved when no intra MBs are forced (i.e., α_RD = 1.0). However, for this α_RD value, the decoded video quality decays very rapidly as the PLR increases. On the other hand, if only a small amount of intra MBs are forced (i.e., α_RD = 1.8), the decoded video quality is slightly improved for the higher PLR values, when compared to the case with no forced intra MBs, but will be slightly penalized for error-free transmission. This effect is even more evident as the α_RD value increases, which corresponds to the situation where more and more intra MBs are gracefully forced, depending on the α_RD value. For example, for α_RD = 3.8 and for a PLR of 10%, the decoded video quality is highly improved relatively to the situation with no forced intra MBs (i.e., 6.36 dB), because the error propagation is significantly reduced. However, for lower PLRs, the decoded video quality is penalized due to the excessive use of intracoding (i.e., 7.19 dB for PLR = 0% and 1.50 dB for PLR = 1%), still for α_RD = 3.8.

Therefore, from what has been presented earlier, it is possible to conclude that the optimal amount of intra coded MBs is highly dependent on the error characteristics of the underlying network and, thus, the error resilience control parameter α_RD should be dynamically adjusted to the channel error conditions to maximize the decoded quality.

FIGURE 1.5
PSNR versus α_RD (alpha in the x-axis label) parameter for various PLRs for the Mother and Daughter sequence. (From Soares, L.D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

In order to illustrate the influence of the α_RD parameter on the decoded PSNR, Figure 1.5 shows the decoded video quality, in terms of PSNR, versus the α_RD parameter for several PLRs for the Mother and Daughter sequence (QCIF, 10 Hz) encoded at 64 kbit/s. Clearly, for each PLR condition, there is an α_RD value that maximizes the decoded video quality. For example, for a PLR of 10%, the maximum PSNR value is achieved for α_RD = 2.2. To further illustrate the importance of a proper selection of the α_RD parameter and how it can significantly improve the overall decoded video quality under severe error conditions, it should be noted that, for a PLR of 10%, the PSNR difference between having α_RD = 2.2 and α_RD = 1.1 is 5.47 dB.

1.4.2 Intra/Inter Mode Decision with Network-Aware α_RD Selection

A possible approach to address the problem of adapting the α_RD parameter to the channel error conditions is to use the information in the receiver reports (RR) of the real-time transport protocol (RTP) control protocol (RTCP) [30] to provide the encoder with the actual error characteristics of the underlying network. This makes it possible to adaptively and efficiently select the amount of intra coded MBs to be inserted in each frame by taking into account this feedback information about the rate of lost packets, as shown in Figure 1.6.

FIGURE 1.6
Network-aware video encoding architecture. (From Soares, L.D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

In the method presented here, the intra/inter mode decision is still based on the α_RD parameter, but this time α_RD may depend on several aspects, such as the content type, the content spatial and temporal resolutions, the coding bit rate, and the PLR of the network.

This way, by considering a mapping function f_NMD, it will be possible to dynamically determine the α_RD parameter from the following expression:

$α_{R D} = f_{N M D} (S, P L R)$ $α_{R D} = f_{N M D} (S, P L R)$

(1.9)

where

PLR is the packet loss rate

S can be an n-dimensional vector characterizing the encoding scenario, for example, in terms of the content motion activity and the texture coding complexity, the content spatial and temporal resolutions, and the coding bit rate

In this work, however, as it will be shown later in Section 1.4.3, the encoding scenario can be characterized solely by the encoded bit rate with a good approximation. The f_NMD function basically maps the encoding scenario and the network PLR into a “good” α_RD parameter that dynamically maximizes the average decoding video quality. Notice that, although it is not easy to obtain a general function, it can be defined for several classes of content and a discrete limited set of encoding parameters and PLRs. In this chapter, it will be shown that, by carefully designing the f_NMD function, significant gains can be obtained in terms of video quality regarding the reference method described in Section 1.4.4.

Therefore, the network-aware MB mode decision (NMD) method can be briefly described through the following steps in terms of encoder operation:

1. Obtain the packet loss rate through network feedback.

2. Compute the α_RD parameter through the mapping function given by Equation 1.9 (and detailed in the following).

3. Perform intra/inter mode decision using the α_RD parameter, computed in Step 2, for the next MB to be encoded, and encode the MB.

4. Check if a new network feedback report has arrived; if yes, go back to Step 1; if not, go back to Step 3.

Notice that it is out of the scope of this chapter to define when the network reports are issued, since this will depend on how the network protocols are configured and the varying characteristics of the network itself [30]. Nevertheless, in real application scenarios, it is important to design appropriate interfacing mechanisms between the codec and the underlying network, in order that both encoder and decoder can adaptively adjust their operations according to the network conditions [12].

Through Equation 1.9, the encoder is able to adjust the amount of intra refresh according to the network error conditions and the available bit rate. This intra refresh method typically increases the intra refresh for the more complex MBs, which are those typically more difficult to conceal. The main problem of this approach is that it does not guarantee that all MBs in the scene are refreshed. This is clearly illustrated in Figure 1.7 for the Foreman sequence, where the right image represents the relative amount of MB intra refresh along the sequence (lighter blocks mean more intra refresh). As it can be seen, with this intra refresh scheme some MBs are never refreshed, which can lead to errors propagating indefinitely along time in these MB positions (dark blocks in Figure 1.7).

1.4.3 Model for the f_NMD Mapping Function

In order to devise a model for the mapping function f_NMD defined in Equation 1.9, it is first important to see how the optimal α_RD parameter varies with PLR. This is plotted in Figure 1.8 for three different sequences (i.e., Mother and Daughter, Foreman, and Mobile and Calendar) encoded at different bit rates, and resolutions, for illustrative purposes. Each curve in Figure 1.8 corresponds to a different encoding scenario S, in terms of the content motion activity and the texture coding complexity, the content spatial and temporal resolutions, and the coding bit rate (see Equation 1.9). As shall be detailed later in Section 1.5, these three sequences have also been encoded at many other bit rates, and the kind of curves obtained was always similar.

FIGURE 1.7
Relative amount of intra refresh (b) for the MBs of the Foreman sequence (a) (QCIF, 15 Hz, 128 kbit/s,and α_RD = 1.1). (From Nunes, P. et al., Automatic and adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, p.3074, November 2009. With permission. © 2009 IEEE.)

FIGURE 1.8
Example of optimal α_RD versus PLR for various sequences and bit rates. (From Soares, L.D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

As can be seen from the plots in Figure 1.8, the behavior of the optimal α_RD parameter versus the PLR is similar to that of a charging capacitor [31] (but starting at α_RD = 1.0). Therefore, for a given sequence and for a given bit rate (i.e., a given encoding scenario S), it should be possible to model the behavior of the α_RD parameter with respect to the PLR with the following expression:

$α_{R D} = 1 + K_{1} \times (1 - e^{- K_{2} \times P L R})$ $α_{R D} = 1 + K_{1} \times (1 - e^{- K_{2} \times P L R})$

(1.10)

where PLR represents the packet loss rate, while K₁ and K₂ represent constants that are specific to the considered encoding scenario, notably the sequence characteristics and bit rate. However, the main problem in using Equation 1.10 to compute α_RD is that, for a given sequence, a different set of K₁ and K₂ would be needed for each of the considered bit rates, which would be extremely unpractical. In order to address this issue, it is important to understand how the optimal α_RD parameter varies when both the PLR and the bit rate vary. This variation is illustrated in Figure 1.9 for the Mobile and Calendar sequence.

FIGURE 1.9
Optimal α_RD versus PLR and bit rate for the Mobile and Calendar sequence. (From Soares, L.D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

After close inspection of Figure 1.9, it can be seen that the K₁ value, which basically dictates the value of α_RD toward which the curve asymptotically converges, depends linearly on the used bit rate and, therefore, it can be modeled by the following expression:

$K_{1} (r_{b}) = a \times r_{b} + b$ $K_{1} (r_{b}) = a \times r_{b} + b$

(1.11)

where r_b is the bit rate, while a and b are the parameters that need to be estimated for a given sequence.

As for the K₂ value, which dictates the growth rate of the considered exponential, it appears, after exhaustive testing, to not depend on the used bit rate. Therefore, as a first approach, it can be considered to be constant, as in

$K_{2} = c$ $K_{2} = c$

(1.12)

This behavior was observed for the three different video sequences mentioned earlier and, therefore, makes it possible to establish a final expression which allows the video encoder to automatically select, for a given sequence, an adequate α_RD parameter when the PLR and the bit rate r_b are known:

$α_{R D} = f_{N M D} (r_{b}, P L R) = 1 + (a \times r_{b} + b) \cdot (1 - e^{- c \times P L R})$ $α_{R D} = f_{N M D} (r_{b}, P L R) = 1 + (a \times r_{b} + b) \cdot (1 - e^{- c \times P L R})$

(1.13)

where a, b, and c are the model parameters that need to be estimated (see Ref. [26]). After extensive experimentation, it was found that the parameters a, b, and c can be considered more or less independent of the sequence, which means that a single set of parameters could be used for three different video sequences with a low fitting error. This basically means that the encoding scenario S, defined in Section 1.4.2, can be well represented only by the bit rate r_b.

As explained in Ref. [26], the parameters a, b, and c could be obtained by considering four packet loss rates and two different bit rates for three different sequences, corresponding to a total of 24 (r_b, PLR) pairs, with the iterative Levenberg–Marquardt method [32,33]. By following this approach, the estimated parameters are a = 0.83 × 10⁻⁶, b = 0.97, and c = 0.90.

1.4.4 Network-Aware Cyclic Intra Refresh

The approach presented in Section 1.4.2 can also be followed to simply adjust the number of cyclic intra refreshed MBs per frame, based on the feedback received about the network PLR, without any RD cost considerations. This is shown in Figure 1.10, where it is clear that for each PLR condition there are a number of cyclic intra refresh MBs that maximize the decoded video quality. However, when comparing the best PSNR results of Figures 1.5 and 1.10 (both obtained for the Mother and Daughter sequence encoded with the same spatial and temporal resolutions and the same bit rate), for a given PLR, the PSNR values obtained by varying α_RD are always higher. For example, for a PLR of 5%, a maximum average PSNR of 37.03 dB is achieved for α_RD = 1.9 (see Figure 1.5), while a maximum PSNR of only 34.94 dB is achieved for 33 cyclically intra refreshed MBs in each frame (see Figure 1.10), a difference of approximately 2 dB. This shows that by adequately choosing the α_RD parameter it should be possible to achieve a higher quality than when using the optimal number of CIR MBs. This is mainly due to the fact that when simply cyclically intra refreshing some MBs in a given frame, the additional RD cost of that decision can be extremely high, penalizing the overall video quality, since the “cheap” intra MBs are not looked for as in the efficient intracoding refresh solution based on the α_RD parameter.

FIGURE 1.10
PSNR versus number of CIR MBs for various PLRs for the Mother and Daughter sequence. (From Soares, L.D. et al., Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, August 2008.)

1.4.5 Intra Refresh with Network-Aware α_RD and CIR Selection

The main drawback of the scheme described in Section 1.4.3 of not being able to guarantee that all MBs are periodically refreshed, can be alleviated by introducing some additional CIR MBs per frame to guarantee that all MB positions are refreshed with a minimum periodicity. This requirement raises the question of how to adaptively select an adequate amount of CIR MBs that is sufficiently high to avoid long-term error propagation without penalizing too much the encoder RD performance.

A possible approach to tackle this problem is to decide the adequate α_RD value and the number of CIR MBs per frame separately, using a different model for each of these two error resilience parameters. For the α_RD selection, the model in Equation 1.9 is used. As for the selection of the number of CIR MBs, it was verified after exhaustive testing [27] that the optimal amount of CIR MBs tends to increase linearly with the bit rate r_b, for a given PLR, but tends to increase exponentially with the PLR, for a given bit rate. Based on these observations, the following model was considered for the selection of the amount of CIR MBs per frame:

$C I R = f_{C I R} (P L R, r_{b}) = (a_{1} \cdot r_{b} + b_{1}) \cdot e^{c_{1} \cdot P L R}$ $C I R = f_{C I R} (P L R, r_{b}) = (a_{1} \cdot r_{b} + b_{1}) \cdot e^{c_{1} \cdot P L R}$

(1.14)

where a₁, b₁, and c₁ are the model parameters that need to be estimated. In Ref. [27], these parameters have been determined by nonlinear curve fitting (the Levenberg–Marquardt method) of the optimal amount of CIR MBs per frame, experimentally determined for a set of representative test sequences, encoding bit rate ranges and packet loss rates. The estimated parameters were a₁ = 12.97 × 10⁻⁶, b₁ = −0.13, and c₁ = 0.24; these parameter values will also be considered here.

Figure 1.11 shows the proposed model as well as the experimental data for the Mobile and Calendar test sequence. As can be seen, a simple linear model would not have represented well the experimental data.

FIGURE 1.11
Optimal amount of CIR MBs per frame versus PLR and bit rate for the Mobile and Calendar sequence. (From Nunes, P. et al., Automatic and adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, p. 3075, November 2009. With permission. © 2009 IEEE.)

The CIR order is randomly defined once before encoding, as described in Section 1.2.2 (and in Ref. [25]), to avoid the subjectively disturbing effect of performing sequential (e.g., raster scan) refresh. The determined order is then cyclically followed with the computed number of MBs being refreshed in each frame.

Therefore, the complete network-aware MB intracoding refresh (NIR) scheme (which was initially proposed in Ref. [27]) can be briefly described by the following steps in terms of encoder operation:

Step 1. Obtain the PLR value through network feedback.

Step 2. Compute the number of CIR MBs to be used per frame, by using the proposed f_CIR function defined by Equation 1.14 and rounding it to the nearest integer.

Step 3. Compute the α_RD value by using the f_NMD function defined by Equation 1.9 in Section 1.4.2.

Step 4. For each MB in a frame, check if it should be forced to intra mode according to the CIR order and the determined number of CIR MBs per frame; if not, perform intra/inter mode decision using the α_RD value computed in Step 3; encode the MB with selected mode.

Step 5. At the end of the frame, check if a new network feedback report has arrived; if yes, go back to Step 1; if not, go back to Step 4.

The definition of when the network reports are issued depends on how the network protocols are configured and the varying characteristics of the network itself [34].

Notice that independently selecting the α_RD value and the amount of CIR MBs, while they are likely interdependent, can lead to chosen values that do not correspond to the optimal (α_RD, CIR) pair. However, it has been verified after extensive experimentation that the considered independent selection process is still robust in the sense that the chosen values are typically close enough to the optimal pair and, therefore, the overall performance is not dramatically penalized.

1.5 Performance Evaluation

To evaluate the performance of the complete NIR scheme described in this chapter, it has been compared in similar conditions to a reference intra refresh scheme, which basically corresponds to the network-aware version with the cyclic intra refresh scheme of the H.264/AVC reference software [25] described in Section 1.4.4. This solution has been adopted because at the time of writing no other network-aware intra refresh techniques, which adaptively take into account the current network conditions, were known.

In the reference scheme, the optimal number of CIR MBs per frame is selected manually for the considered network conditions, while in the considered NIR solution, the selection of the amount of CIR MBs per frame and the α_RD parameter is done fully automatically. For the complete NIR and reference schemes, the Mother and Daughter, the Foreman, and the Mobile and Calendar video sequences have been encoded using the H.264/AVC Baseline Profile [25]. The used test conditions, which are representative of those currently used for personal communications over mobile networks, are summarized in Table 1.1. For QCIF, each frame was divided into three slices, while for CIF each frame was divided into six slices. In both cases, each slice consists of three MB rows. After encoding, each slice was mapped to an RTP packet for network transmission [34].

TABLE 1.1

Test Conditions

Video Test Sequence	Mother and Daughter	Foreman	Mobile and Calendar
Spatial resolution	QCIF	QCIF	CIF
Frame rate (Hz)	10	10	15
Bit rate (kbit/s)	24–64	48–128	384–1152

Source: Nunes, P., Soares, D., and Periera, F., Error resilient macroblock rate control for H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, p. 2134, October 2008. With permission. Copyright 2008 IEEE.

For the reference scheme, the number of cyclically intra refreshed MBs per frame was chosen for each PLR and bit rate, such that the decoded video quality would be the best possible. This was done manually by performing an exhaustive set of tests using many different amounts of CIR MBs per frame and then choosing the one that leads to the highest decoded average PSNR value, obtained by averaging over 50 different error patterns. For the QCIF video sequences, the possible values for the number of cyclically intra refreshed MBs were chosen from the representative set {0, 5, 11, 22, 33, …, 99}, while for the CIF video sequences the representative set consisted of {0, 22, 44, 66,…, 396}.

To simulate the network conditions, three different PLRs were considered: 1%, 5%, and 10%. Since each slice is mapped to one RTP packet, each lost packet will correspond to a lost video slice. Packet losses are considered independent and identically distributed. For each one of the studied PLRs, each coded bitstream has been corrupted and then decoded 50 times (i.e., corresponding to 50 different error patterns or runs), while applying the default error concealment technique implemented in the H.264/AVC reference software [25,28]. The presented results correspond to PSNR averages of these 50 different runs for the luminance component (PSNR Y).

For the conditions mentioned earlier, PSNR Y results are shown in Tables 1.2 through 1.4 for the Mother and Daughter, Foreman, and Mobile and Calendar video sequences, respectively. In these tables, NIR refers to the complete network-aware intracoding refresh scheme described in this chapter, and JM refers to the reference technique (winning cases appear in bold). In addition, OPT corresponds to the manual selection of the best (α_RD, CIR) pair.

TABLE 1.2

PSNR Results for the Mother and Daughter Sequence

Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, p. 3076, November 2009. With permission. Copyright 2009 IEEE.

TABLE 1.3

PSNR Results for the Foreman Sequence

TABLE 1.4

PSNR Results for the Mobile and Calendar Sequence

No visual results are given here, because the direct comparison of peer frames (encoded with different coding mode selection schemes) is rather meaningless in this case; only the comparison of the total video quality for several error patterns makes sense. This is due to the fact that the generated streams for the proposed and the reference techniques are different and, even if the same error pattern is used to corrupt them, the errors will affect different parts of the data at a given time instant, causing very different artifacts.

To help the reader to better read the gains obtained with the proposed technique, the results obtained for the Mother and Daughter sequence are also shown in a plot in Figure 1.12, for both JM and NIR. For the Foreman and the Mobile and Calendar sequences, the trends are similar.

FIGURE 1.12
PSNR results for the Mother and Daughter sequence. (From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, p. 3076, November 2009. With permission. Copyright 2009 IEEE.)

The presented results show that, when the fully automatic NIR scheme is used, the decoded video quality is significantly improved for the vast majority of tested conditions when compared to the reference method with a manually selected amount of CIR MBs (JM). Improvements of the NIR method can be as high as 1.90 dB for the Mother and Daughter sequence encoded at 64 kbit/s and a PLR of 5%. The most significant exception is for the PLR of 10% and higher bit rates (see Tables 1.3 and 1.4). This exception is due to the fact that, for these PLR and bit rate values, the number of CIR MBs chosen with the proposed f_CIR is slightly different from the optimal values.

When comparing the NIR scheme to the one proposed in Ref. [26], which does not use CIR, the NIR PSNR Y values are most of the times higher than or equal to those achieved in Ref. [26]. The highest gains occur for the Foreman sequence encoded at 128 kbit/s and a PLR of 10% (0.90 dB), and for the Mobile and Calendar sequence encoded at 768 kbit/s and a PLR of 10% (0.60 dB). For the cases, where the NIR leads to lower PSNR Y values, the losses are never more than 0.49 dB, which happens for the Mobile and Calendar sequence encoded at 896 kbit/s and a PLR of 5%.

Notice, however, that the scheme in Ref. [26] cannot guarantee that all MBs will eventually be refreshed, which is a major drawback for real usage in error-prone environments, such as mobile networks. On the other hand, the one described in this chapter can, not only overcome this drawback, but it does so fully automatically, without any user intervention.

1.6 Final Remarks

This chapter describes a method to efficiently and fully automatically perform intracoding refresh, while taking into account the PLR of the underlying network and the encoded bit rate. The described method can be used to efficiently generate error-resilient H.264/AVC bitstreams that are perfectly adapted to the channel error characteristics. This is extremely important because it can mean that error-resilient video transmission will be possible in environments with varying error characteristics with an improved quality, notably, when compared to the case where the MB intracoding decisions are taken without considering the error characteristics of the network.

Acknowledgments

The authors would like to acknowledge that the work described in this chapter was developed at Instituto de Telecomunicações (Lisboa, Portugal) and was supported by FCT project PEst-OE/EEI/LA0008/2011.

References

1. A. H. Li, S. Kittitornkun, Y.-H. Hu, D.-S. Park, J. Villasenor, Data partitioning and reversible variable length codes for robust video communications, Proceedings of the IEEE Data Compression Conference, Snowbird, UT, pp. 460–469, March 2000.

2. G. Cote, S. Shirani, F. Kossentini, Optimal mode selection and synchronization for robust video communications over error-prone networks, IEEE Journal on Selected Areas in Communications, 18(6), 952–965, June 2000.

3. S. Wenger, G. D. Knorr, J. Ott, F. Kossentini, Error resilience support in H.263+, IEEE Transactions on Circuits and Systems for Video Technology, 8(7), 867–877, November 1998.

4. L. P. Kondi, F. Ishtiaq, A. K. Katsaggelos, Joint source-channel coding for motion-compensated DCT-based SNR scalable video, IEEE Transactions on Image Processing, 11(9), 1043–1052, September 2002.

5. H. M. Radha, M. van der Schaar, Y. Chen, The MPEG-4 fine-grained scalable video coding method for multimedia streaming over IP, IEEE Transactions on Multimedia, 3(1), 53–68, March 2001.

6. T. Schierl, T. Stockhammer, T. Wiegand, Mobile video transmission using scalable video coding, IEEE Transactions on Circuits and Systems for Video Technology, 17(9), 1204–1217, September 2007.

7. R. Puri, K. Ramchandran, Multiple description source coding through forward error correction codes, Proceedings of the Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, vol. 1, pp. 342–346, October 1999.

8. V. K. Goyal, Multiple description coding: Compression meets the network, IEEE Signal Processing Magazine, 18(5), 74–93, September 2001.

9. K. Stuhlmüller, N. Färber, M. Link, B. Girod, Analysis of video transmission over lossy channels, IEEE Journal on Selected Areas in Communications, 18(6), 1012–1032, June 2000.

10. L. D. Soares, F. Pereira, Error resilience and concealment performance for MPEG-4 frame-based video coding, Signal Processing: Image Communication, 14(6–8), 447–472, May 1999.

11. A. K. Katsaggelos, F. Ishtiaq, L.P. Kondi, M.-C. Hong, M. Banham, J. Brailean, Error resilience and concealment in video coding, Proceedings of the European Signal Processing Conference, Rhodes, Greece, pp. 221–228, September 1998.

12. Y. Wang, S. Wenger, J. Wen, A. Katsaggelos, Error resilient video coding techniques IEEE Signal Processing Magazine, 17(4), 61–82, July 2000.

13. F. Zhai, A. Katsaggelos, Joint Source-Channel Video Transmission, Morgan & Claypool Publishers, San Rafael, CA, 2007.

14. ISO/IEC 14496-10, Information Technology—Coding of Audio-Visual Objects—Part 10: Advanced Video Coding, 2005.

15. ISO/IEC 14496-2, Information Technology—Coding of Audio-Visual Objects—Part 2: Visual (2nd Edn.), 2001.

16. P. Haskell, D. Messerschmitt, Resynchronization of motion compensated video affected by ATM cell loss, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, vol. 3, pp. 545–548, March 1992.

17. G. Côté, F. Kossentini, Optimal intra coding of blocks for robust video communication over the Internet, Signal Processing: Image Communication, 15(1–2), 25–34, September 1999.

18. J. Y. Liao, J.D. Villasenor, Adaptive intra block update for robust transmission of H.263, IEEE Transactions on Circuits and Systems for Video Technology, 10(1), 30–35, February 2000.

19. P. Frossard, O. Verscheure, AMISP: A complete content-based MPEG-2 error-resilient scheme, IEEE Transactions on Circuits and Systems for Video Technology, 11(9), 989–998, September 2001.

20. Z. He, J. Cai, C. Chen, Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding, IEEE Transactions on Circuits and Systems for Video Technology, 12(6), 511–523, June 2002.

21. H. Shu, L. Chau, Intra/Inter macroblock mode decision for error-resilient transcoding, IEEE Transactions on Multimedia, 10(1), 97–104, January 2008.

22. H-J. Ma, F. Zhou, R.-X. Jiang, Y.-W. Chen, A network-aware error-resilient method using prioritized intra refresh for wireless video communications, Journal of Zhejiang University - Science A, 10(8), 1169–1176, August 2009.

23. P. Nunes, L.D. Soares, F. Pereira, Error resilient macroblock rate control for H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, San Diego, CA, pp. 2132–2135, October 2008.

24. Z. Li, F. Pan, K. Lim, G. Feng, X. Lin, S. Rahardaj, Adaptive basic unit layer rate control for JVT, Doc. JVT-G012, 7th MPEG Meeting, Pattaya, Thailand, March 2003.

25. ISO/MPEG & ITU-T, H.264/AVC Reference Software, Available: http://iphome.hhi.de/suehring/tml/download/

26. L.D. Soares, P. Nunes, F. Pereira, Efficient network-aware macroblock mode decision for error resilient H.264/AVC video coding, Proceedings of the SPIE Conference on Applications of Digital Image Processing, vol. 7073, San Diego, CA, pp. 1–12, August 2008.

27. P. Nunes, L.D. Soares, F. Pereira, Automatic and adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE International Conference on Image Processing, Cairo, Egypt, pp. 3073–3076, November 2009.

28. K.-P. Lim, G. Sullivan, T. Wiegand, Text description of joint model reference encoding methods and decoding concealment methods, Doc. JVT-X101, ITU-T VCEG Meeting, Geneva, Switzerland, June 2007.

29. T. Wiegand, H. Schwarz, A. Joch, F. Kossentini, G. Sullivan, Rate-constrained coder control and comparison of video coding standards, IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703, July 2003.

30. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, RTP: A transport protocol for real-time applications, Internet Engineering Task Force, RFC 1889, January 1996.

31. R. C. Dorf, J.A. Svoboda, Introduction to Electric Circuits, 5th Edition, Wiley, New York, 2001.

32. K. Levenberg, A method for the solution of certain non-linear problems in least squares, Quarterly of Applied Mathematics, 2(2), 164–168, July 1944.

33. D. Marquardt, An algorithm for the least-squares estimation of nonlinear parameters, SIAM Journal of Applied Mathematics, 11(2), 431–441, June 1963.

34. S. Wenger, H.264/AVC over IP, IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 645–656, July 2003.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Network-Aware Error-Resilient Video Coding

Create new playlist

Sign In

Sign Up

Table of Contents for
1. Network-Aware Error-Resilient Video Coding