4. Macroblock Classification Method for Computation Control Video Coding and Other Video Applications Involving Motions

Complexity-scalable video coding (CSVC) (or computational-scalable/power-aware video coding) is of increasing importance to many applications (Burleson et al., 2001; Chen et al., 2006; He et al., 2005; Huang et al., 2005; Kim et al., 2006; Lin et al., 2008; Tai et al., 2003; Yang et al., 2005; Yi and Ling, 2005), such as video communication over mobile devices with limited power budget as well as real-time video systems that require coding the video below a fixed number of processor computation cycles.

The target of the CSVC research is to find an efficient way to allocate the available computation budget for different video parts (e.g., group of pictures [GOPs], frames, and macroblocks [MBs]) and different coding modules (e.g., motion estimation [ME], discrete cosine transform [DCT], and entropy coding) so that the resulting video quality is kept as high as possible under the given computation budget. Since the available computation budget may vary, the CSVC algorithm should be able to perform video coding under different budget levels.

Normally, in a power-rich condition, a high-quality CSVC strategy is preferred in spite of higher power consumption. On the contrary, more power-efficient CSVC strategies need to be introduced with lower power consumption to prolong the service time. In Figure 4.1, the battery discharging effects of three kinds of video coding strategies are shown (Chen et al., 2009). The power-aware CSVC strategy B can have more than twice the battery lifetime of the traditional encoder A. We can further extend the battery lifetime with a more power-efficient CSVC strategy C by gradually stepping down the power dissipation such that the battery capacity can be fully utilized with a lower loading.

Several researches have been proposed on CSVC. He et al. (2005) analyzed the video encoding system and proposed a computation-bit rate-PSNR model for the entire video encoding system. However, since they aimed at modeling the whole system, their proposed CSVC algorithm for the ME module is very simple. Besides, Burleson et al. (2001) modeled the computational-scalable algorithm for DCT module. Vanum et al. (2007) as well as Vanum et al. (2009) tried to control the coding complexity by adaptively tuning the encoder parameters. However, these methods either fail to provide an efficient computation control strategy or do not address the computation control for the important ME part.

FIGURE 4.1
Battery discharging effects of three types of CSVC encoders.

Since ME occupies the major portion of the whole coding complexity (Wiegand et al., 2003; Zhang and He, 2003), we will focus on the computation allocation for the ME part in this chapter (i.e., computation control motion estimation [CCME]). Furthermore, since the computation often can be roughly measured by the number of search points (SPs) in ME, we will use the term SP and Computation interchangeably. Therefore, the CCME target for this chapter can be described by: given the total number of ME computation budget (e.g., number of SAD computation or SP), trying to find efficient ways to allocate them into different video parts (frames, GOPs, and MBs) such that the video coding performance is kept as high as possible under the available computation, as shown in Figure 4.2.

Many algorithms have been proposed for CCME (Chen et al., 2006; He et al., 2005; Huang et al., 2005; Kim et al., 2006; Tai et al., 2003; Yang et al., 2005). They can be evaluated by two key parts of CCME: (a) the computation allocation, and (b) the MB importance measure. They are described as follows:

1. The computation allocation order. Two approaches can be used for allocating the computations: one-pass flow and multi-pass flow. Most previous CCME methods (Kim et al., 2006; Tai et al., 2003; Yang et al., 2005) allocate computation in a multi-pass flow, where MBs in one frame are processed in a step-by-step fashion based on a table that measures the MB importance. At each step, the computation is allocated to the MB that is measured as the most important among all the MBs in the whole frame. The table is updated after each step. Since the multi-pass methods use a table for all MBs in the frame, they can have a global view of the whole frame while allocating computation. However, they do not follow the regular coding order and require the ME process to jump between MBs, which is less desirable for hardware implementations. Furthermore, since the multi-pass methods do not follow the regular coding order, the neighboring MB information cannot be used for prediction to achieve better performance. Compared to the multi-pass flow approach, one-pass methods (Chen et al., 2006; Huang et al., 2005) allocate computation and perform ME in the regular video coding order. They are more favorable for hardware implementation and can also utilize the information from neighboring MBs. However, it is more difficult to develop a good one-pass method since (a) a one-pass method lacks a global view of the entire frame and may allocate unbalanced computations to different areas of the frame, and (b) it is more difficult to find a suitable method to measure the importance of MBs.

FIGURE 4.2
The target for CCME: Allocate the available ME computation budget efficiently within the whole video sequence.

2 The MB importance measure. In order to allocate computation efficiently to different MBs, it is important to measure the importance of the MBs for the coding performance, so that more computation will be allocated to the more important MBs (i.e., MBs with larger importance measure values). Tai et al. (2003) use the current sum of absolute difference (SAD) value for the MB importance measure. Their assumption is that MBs with large matching costs will have more room to improve, and thus more SPs will be allocated to these MBs. Chen et al. (2006) as well as Huang et al. (2005) use a similar measure in their one-pass method. However, the assumption that larger current SAD will lead to bigger SAD decrease is not always guaranteed, which makes the allocation less accurate. Yang et al. (2005) use the ratio between the SAD decrease and the number of SPs at the previous ME step to measure the MB importance. Kim et al. (2006) use a similar measure except that they use rate-distortion cost decrease instead of the SAD decrease. However, their methods can only be used in multipass methods where the allocation is performed in a step-by-step fashion and cannot be applied to one-pass methods.

In this chapter, a new one-pass CCME method is proposed. We first propose a class-based MB importance measure (CIM) method where MBs are classified into different classes based on their properties. The importance of each MB is measured by combining its class information as well as its initial matching cost value. Based on the CIM method, a complete CCME framework is then proposed, which first divides the total computation budget into independent sub-budgets for different MB classes and then allocates the computation from the class budget to each step of the ME process. Furthermore, the proposed method performs ME in a one-pass flow, which is more desirable for hardware implementation. Furthermore, we also propose algorithms using our macroblock classification method for different video processing applications, including shot change detection, motion discontinuity detection, and outlier rejection for global motion estimation (GME). Experimental results demonstrate that the proposed method can allocate computation more accurately than previous methods while maintaining good quality.

The rest of the chapter is organized as follows: Section 4.2 describes our proposed CIM method. Based on the CIM method, Section 4.3 describes the proposed CCME algorithm in detail. The experimental results are given in Section 4.4. Section 4.5 describes extending our proposed macroblock classification method for other video processing applications, and Section 4.6 concludes the chapter.

4.2 The Class-Based MB Importance Measure

In this section, we discuss some statistics of ME and describe our CIM method in detail. For convenience, we use COST (Lin et al., 2008) as the ME matching cost in the rest of the chapter. The COST (Lin et al., 2008) is defined as

COST=SAD+λMOTION⋅R(MV) $C O S T = S A D + λ_{M O T I O N} \cdot R (M V)$

(4.1)

where

SAD is the sum of absolute difference for the block matching error

R(MV) is the number of bits to code the motion vector (MV)

λ_MOTION is the lagrange multiplier (Weigand et al., 2003)

In this chapter, the CIM method and the proposed CCME algorithm are described based on the simplified hexagon search (SHS) algorithm (Yi et al., 2005). However, our algorithms are general and can easily be extended to other ME algorithms (Li et al., 1994; Lin et al., 2008; Po and Ma, 1996; Zhou et al., 2004; Zhu and Ma, 2000).

The SHS is a newly developed ME algorithm, which can achieve performance close to full search (FS) with comparatively low SPs. The SHS process can be described as in Figure 4.3.

Before the ME process, the SHS algorithm first checks the init_COST, which is defined as

init_COST=min(COST(0,0),COSTPMV) $i n i t_C O S T = \min (C O S T_{(0, 0)}, C O S T_{P M V})$

(4.2)

where

COST_(0,0) is the COST of the (0,0) MV

COST_PMV is the COST of the predictive MV (PMV) (Yi et al., 2005)

FIGURE 4.3
The SHS process. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 1. With permission.)

If init_COST is smaller than a threshold th₁, the SHS algorithm will stop after performing a small local search (search four points around the position of the init_COST), which we call the Upper Path. If init_COST is larger than the threshold, the SHS algorithm will proceed to the steps of Small Local Search, Cross Search, Multi Hexagon Search, Small Hexagon Search, and Small Diamond Search (Yi et al., 2005), which we call the Lower Path. Inside the lower path, another threshold th₂ is used to decide whether or not to skip the steps of Cross Search and Multi Hexagon Search.

4.2.1 Analysis of Motion Estimation Statistics

In order to analyze the relationship between the COST value and the number of SPs, we define two more COSTs: COST_mid (the COST value right after the Small Local Search step in the Lower Path) and COST_final (the COST value after going through the entire ME process), as in Figure 4.3. Three MB classes are defined as

Classcur_MB=⎧⎩⎨⎪⎪223if init_COST<th1if init_COST≥th1 and|COST_mid−COST_final|>cif init_COST≥th1 and|COST_mid−COST_final|≤c $C l a s s_{c u r_M B} = {\begin{array}{l} 2 & if i n i t_C O S T < t h_{1} \\ 2 & if i n i t_C O S T \geq t h_{1} and | C O S T_m i d - C O S T_f i n a l | > c \\ 3 & if i n i t_C O S T \geq t h_{1} and | C O S T_m i d - C O S T_f i n a l | \leq c \end{array}$

(4.3)

where

cur_MB is the current MB

th₁ is the threshold defined in the SHS algorithm (Yi et al., 2005) to decide whether the init_COST is large or small (Yi et al., 2005)

c is another threshold to decide the significance of the cost improvement between COST_mid and COST_final

MBs in Class 1 are MBs with small current COST values. Class 2 represents MBs with large current COST values where additional searches can yield significant improvement. Class 3 represents MBs with large current COST values but where further searches do not produce significant improvement. If we can predict Class 3 MBs, we can save computation by skipping further searches for the Class 3 MBs. It should be noted that since we cannot get COST_final before actually going through the Lower Path, the classification method of Equation 4.3 is only used for statistical analysis. A practical classification method will be proposed later in this section. Furthermore, since MBs in Class 1 have small current COST value, their MB importance measure can be easily defined. Therefore, we will focus on the analysis of Class 2 and Class 3 MBs.

Table 4.1 lists the percentage of Class 1, Class 2, and Class 3 MBs over the total MBs for sequences of different resolutions and under different quantization parameter (QP) values where c of Equation 4.3 is set to be different values of 0, 2% of COST_mid, and 4% of COST_mid. It should be noted that 0 is the smallest possible value for c. We can see from Table 4.1 that the number of Class 3 MBs will become even larger if c is relaxed to larger values.

Figure 4.4 shows the COST value distribution of Class 2 MBs and Class 3 MBs where c of Equation 4.3 is set to be 0. We only show results for Foreman_Qcif with QP = 28 in Figure 4.4. Similar results can be observed for other sequences and other QP values. In Figure 4.4, 20 frames are coded. The experimental setting is the same as that described in Section 4.5. In order to have a complete observation, all the three COST values are displayed in Figure 4.4, where Figures 4.4a through c show the distributions of init_COST, COST_mid, and COST_final, respectively.

From Figure 4.4 and Table 4.1, we can observe that (a) a large portion of MBs with large current COST values can be classified as Class 3 where only a few SPs are needed and additional SPs do not produce significant improvement, and (b) the distribution of all the three COSTs for Class 2 and Class 3 is quite similar. This implies that Class 2 or Class 3 cannot be differentiated based on their COST value only.

Based on these observations, we can draw several conclusions for the computation allocation as follows:

1. The number of SPs needed for keeping the performance for each MB is not always related to its current COST value. Therefore, using the COST value alone as the MB importance measure, which has been used by many previous methods (Chen et al., 2006; Huang et al., 2005; Yang et al., 2005), may not allocate SPs efficiently.

2. Further experiments show that for Class 2 MBs, the number of SPs needed for keeping the performance is roughly proportional to their init_COST value (although it is not true if Class 2 and Class 3 MBs are put together).

These imply that we can have a better MB importance measure if we use the class and COST information together.

As mentioned earlier, since we cannot get COST_final before going through the Lower Path, Class 2 and Class 3 cannot be differentiated by their definition in Equation 4.3 in practice. Furthermore, since the COST distribution of Class 2 and Class 3 is similar, the current COST value cannot differentiate between these two classes. Therefore, before describing our MB Importance Measure method, we first propose a practical MB classification method, which we call the predictive-MV-accuracy-based classification (PAC) algorithm. The PAC algorithm will be described in the following section.

TABLE 4.1

Percentage of Class 1, Class 2, and Class 3 MBs over the Total MBs (100 Frames for Qcif and 50 Frames for Cif and SD)

Source: From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Table 1, With permission.

FIGURE 4.4
COST value distribution for Class 2 and Class 3 MBs for Foreman_qcif sequence ((A) Class 2, (B) Class 3). (a) Init_COST distribution comparison; (b) COST_mid distribution comparison; (c) COST_final distribution comparison. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 2. With permission.)

4.2.2 The Predictive-MV-Accuracy-Based Classification Algorithm

The proposed PAC algorithm converts the definition of Class 2 and Class 3 from the COST value point of view to the predictive MV accuracy point of view. The basic idea of the PAC algorithm is described as follows:

1. If the motion pattern of a MB can be predicted accurately (i.e., if PMV is accurate), then only a small local search is needed to find the final MV (i.e., the MV of COST_final). In this case, no matter how large the COST is, additional SPs after the small local search are not needed because the final MV has already been found by the small local search. This corresponds to Class 3 MBs.

2. On the other hand, if the motion pattern of a MB cannot be accurately predicted, a small local search will not be able to find the final MV. In this case, a large area search (i.e., the Lower Path) after the small local search is needed to find the final MV with a lower COST value. This corresponds to Class 2 MBs.

Since the MV_final (MV for COST_final) cannot be obtained before going through the Lower Path, the final MV of the colocated MB in the previous frame is used instead to measure the accuracy of motion pattern prediction. Therefore, the proposed PAC algorithm can be described as

Classcur_MB=⎧⎩⎨⎪⎪123if init_COST<thif init_COST≥th and|PMVcur_MB−MVpre_final|>Thif init_COST≥th and|PMVcur_MB−MVpre_final|≤Th $C l a s s_{c u r_M B} = {\begin{array}{l} 1 & if i n i t_C O S T < t h \\ 2 & if i n i t_C O S T \geq t h and | P M V_{c u r}__{M B} - M V_{p r e}__{f i n a l} | > T h \\ 3 & if i n i t_C O S T \geq t h and | P M V_{c u r}__{M B} - M V_{p r e}__{f i n a l} | \leq T h \end{array}$

(4.4)

where

|PMVcur_MB−MVpre_final| $| P M V_{c u r}__{M B} - M V_{p r e}__{f i n a l} |$ is the measure of the motion pattern prediction accuracy

PMVcur_MB $P M V_{c u r}__{M B}$ is the PMV (Yi et al., 2005) of the current MB

MVpre_final $M V_{p r e}__{f i n a l}$ is the final MV of the colocated MB in the previous frame

Th is the threshold to check whether the PMV is accurate or not

Th can be defined based on different small local search patterns. In the case of SHS, Th can be set as 1 in integer pixel resolution. According to Equation 4.4, Class 1 includes MBs that can find good matches from the previous frames. MBs with irregular or unpredictable motion patterns will be classified as Class 2. Class 3 MBs will include areas with complex textures but similar motion patterns to the previous frames.

TABLE 4.2

The Detection Rates of the PAC Algorithm

Sequence	Class 2 Detection Rate (%)	Class 3 Detection Rate (%)
Mobile Qcif	80	82
Football_Cif	71	90
Foreman_Qcif	75	76

Source: From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Table 2, With permission.

It should be noted that the classification using Equation 4.4 is very tight (in our case, any MV difference larger than 1 integer pixel will be classified as Class 2 and a large area search will be performed). Furthermore, by including MV_{pre_final} for classification, we also take the advantage of including the temporal motion smoothness information when measuring motion pattern prediction accuracy. Therefore, it is reasonable to use MV_{pre_final} to take the place of MV_final. This will be demonstrated in Table 4.2 and Figure 4.5 and will be further demonstrated in the experimental results.

Table 4.2 shows the detection rates for Class 2 and Class 3 MBs with our PAC algorithm for some sequences, where the class definition in Equation 4.3 is used as the ground truth and c in Equation 4.3 is set to be 0. Table 4.2 shows that our PAC algorithm has high MB classification accuracy.

Figure 4.5 shows the distribution of MBs for each class of two example frames by using our PAC algorithm. Figures 4.5a and e are the original frames Blocks labeled gray in (b) and (f) are MBs belonging to Class 1. Blocks labeled black in (c) and (g) and blocks labeled white in (d) and (h) are MBs belonging to Class 2 and Class 3, respectively.

Figure 4.5 shows the reasonableness of the proposed PAC algorithm. From Figure 4.5, we can see that most Class 1 MBs include backgrounds or flat areas that can find good matches in the previous frames ((b) and (f)). Areas with irregular or unpredictable motion patterns are classified as Class 2 (e.g., the edge between the calendar and the background as well as the bottom circling ball in (c), and the running bus as well as the down-right logo in (g)). Most complex-texture areas are classified as Class 3, such as the complex background and calendar in (d) as well as the flower area in (h).

4.2.3 The MB Importance Measure

Based on the discussion in the previous sections and the definition of MB classes in Equation 4.4, we can describe our proposed CIM method as follows:

1. MBs in Class 1 will always be allocated a fixed small number of SPs.

2. MBs in Class 2 will have high importance. They will be allocated more SPs, and each Class 2 MB will have a guaranteed minimum SPs for coding performance purposes. If two MBs both belong to Class 2, their comparative importance is proportional to their init_COST value and the SPs will be allocated accordingly.

FIGURE 4.5
The original frames (a, e) and the distributions of Class 1 (b, f), Class 2 (c, g), and Class 3 (d, h) MBs for Mobile_Cif and Bus_Cif (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 3. With permission.)

3. MBs in Class 3 will have lower importance than MBs in Class 2. Similar to Class 2, we make the comparative importance of MBs within Class 3 also proportional to their init_COST value. By allowing some Class 3 MBs to have more SPs rather than fixing the SPs for each MB, the possible performance decrease due to the misclassification of MBs from Equation 4.4 can be avoided. This will be demonstrated in the experimental results.

With the CIM method, we can have a more accurate MB importance measure by differentiating MBs into classes and combining the class and the COST information. Based on the CIM method, we can develop a more efficient CCME algorithm. The proposed CCME algorithm will be described in detail in the following section.

4.3 The CCME Algorithm

The framework of the proposed CCME algorithm is described in Figure 4.6.

From Figure 4.6, the proposed CCME algorithm has four steps:

1. Frame-level computation allocation (FLA). Given the available total computation budget for the whole video sequence, FLA allocates a computation budget to each frame.

2. Class-level computation allocation (CLA). After one frame is allocated a computation budget, CLA further divides the computation into three independent sub-budgets (or class budgets), with one budget for each class defined in Equation 4.4.

FIGURE 4.6
The framework for the proposed CCME algorithm. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 4. With permission.)

3. MB-level computation allocation (MLA). When performing ME, each MB will first be classified into one of the three classes according to Equation 4.4. MLA then allocates the computation to the MB from its corresponding class budget.

4. Step-level computation allocation (SLA). After an MB is allocated a computation budget, SLA allocates these computations into each ME step.

It should be noted that the CLA step and the MLA step are the key steps of the proposed CCME algorithm where our proposed CIM method is implemented. Furthermore, we also investigated two strategies for computation allocation for CLA and MLA steps: the tight strategy and the loose strategy. For the tight strategy, the actual computation used in the current frame must be lower than the computation allocated to this frame. Due to this property, the FLA step is sometimes not necessary for the tight strategy. In some applications, we can simply set the budget for all frames as a fixed number for performing the tight strategy. For the loose strategy, the actual computation used for some frames can exceed the computation allocated to these frames but the total computation used for the whole sequence must be lower than the budget. Since the loose strategy allows frames to borrow computation from others, the FLA step is needed to guarantee that the total computation used for the whole sequence will not exceed the available budget.

Since the performances of the loose-strategy algorithm and the tight-strategy algorithm are similar based on our experiments, we will only describe our algorithm based on the tight strategy in this chapter. It should be noted that since the basic ideas of the CLA and MLA processes are similar for both the tight and loose strategies, a loose-strategy algorithm can be easily derived from the description in this chapter. Furthermore, as mentioned, the FLA step is sometimes unnecessary for the tight strategy. In order to prevent the effect of FLA and to have a fair comparison with other methods, we also skip the FLA step by simply fixing the target computation budget for each frame in this chapter. In practice, various FLA methods (Chen et al., 2006; Kim et al., 2006; Tai et al., 2003; Yang et al., 2005) can be easily incorporated into our algorithm.

4.3.1 Class-Level Computation Allocation

The basic ideas of the CLA process can be summarized as follows:

1. In the CLA step, the computation budget for the whole frame C_F is divided into three independent class budgets (i.e.,CClass(1),CClass(2),and CClass(3)) $(i .e ., C_{C l a s s (1)}, C_{C l a s s (2)}, and C_{C l a s s (3)})$ . MBs from different classes will be allocated computation from their corresponding class budget and will not affect each other.

2. Since the CLA step is based on the tight strategy in this chapter, thebasic layer BL_Class(i) is first allocated to guarantee that each MB has a minimum number of SPs. The remaining SPs are then allocated to the additional layer AL_Class(i). The total budget for each class consists of the basic layer plus the additional layer. Furthermore, since the MBs in class 1 only performs a local search, the budget for class 1 only contains the basic layer (i.e.,CClass(1)=BLClass(1) and ALClass(1)=0) $(i .e ., C_{C l a s s (1)} = B L_{C l a s s (1)} and A L_{C l a s s (1)} = 0)$ .

3. The actual computation used for each class in the previous frame (CApreClass(i)) $(C A_{C l a s s (i)}^{p r e})$ is used as the ratio parameter for class budget allocation for the additional layer.

Therefore, the CLA process can be described as in Equation 4.5 and Figure 4.7:

CClass(i)=BLClass(i)+ALClass(i) i=1,2,3 $C_{C l a s s (i)} = B L_{C l a s s (i)} + A L_{C l a s s (i)} i = 1, 2, 3$

(4.5)

where

BLClass(i)=BLMB_Class(i)⋅NMpreClass(i) $B L_{C l a s s (i)} = B L_{M B_C l a s s (i)} \cdot N M_{C l a s s (i)}^{p r e}$

ALClass(i)⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪0min(ALF.CAperClass(2)CAperClass(2)+CAperClass(3),ALMB_max_Class(2).NMperClass(i))ALF−ALClass(2)if i=1if i=2if i=3 $A L_{C l a s s (i)} {\begin{array}{l} 0 & if i = 1 \\ \min (A L_{F} . \frac{C A_{C l a s s (2)}^{p e r}}{C A_{C l a s s (2)}^{p e r} + C A_{C l a s s (3)}^{p e r}}, A L_{M B_m a x_C l a s s (2)} . N M_{C l a s s (i)}^{p e r}) & if i = 2 \\ A L_{F} - A L_{C l a s s (2)} & if i = 3 \end{array}$

BLF=(BLClass(1)+BLClass(2)+BLClass(3)) $B L_{F} = (B L_{C l a s s (1)} + B L_{C l a s s (2)} + B L_{C l a s s (3)})$

ALF=CF−BLF $A L_{F} = C_{F} - B L_{F}$

C_Class(i) is the computation allocated to class i, and BL_Class(i) and AL_Class(i) represent the computation allocation for the class i basic layer and additional layer, respectively. C_F is the total computation budget for the whole frame, and BL_F and AL_F represent the basic layer computation and the additional layer computation for the whole frame, respectively. NMpreClass(i) $N M_{C l a s s (i)}^{p r e}$ is the total number of MBs belonging to Class i in the previous frame and CApreClass(i) $C A_{C l a s s (i)}^{p r e}$ is the number of computation actually used for the Class i in the previous frame. BLMB_Class(i) $B L_{M B_C l a s s (i)}$ is the minimum number of computations guaranteed for each MB in the basic layer. In the case of SHS, we set BLMB_Class(1)=BLMB_Class(3)=6 SPs $B L_{M B_C l a s s (1)} = B L_{M B}__{C l a s s (3)} = 6 SPs$ for Class 1 and Class 3, and BLMB_Class(2)=25 SPs $B L_{M B}__{C l a s s (2)} = 25 SPs$ for Class 2. As mentioned, since Class 2 MBs have higher importance in our CIM method, we guarantee them a higher minimum SP Furthermore, in order to avoid too many useless SPs allocated to Class 2 MBs, a maximum number of SPs (ALMB_max_Class(2)) $SPs (A L_{M B_m a x_C l a s s (2)})$ is set. SPs larger than ALMB_max_Class(2) $A L_{M B_m a x_C l a s s (2)}$ are likely wasted and therefore are allocated to Class 3 MBs (ALF−ALClass(2)) $(A L_{F} - A L_{C l a s s (2)})$ .

FIGURE 4.7
The tight-strategy-based CLA process. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 5. With permission.)

From Equation 4.5 and Figure 4.7, we can summarize several features of our CLA process as follows:

1. Since Class is newly defined in this chapter, the CLA step is unique in our CCME method and is not included in the previous CCME algorithms (Chen et al., 2006; He et al., 2005; Huang et al., 2005; Kim et al., 2006; Tai et al., 2003; Yang et al., 2005).

2. When performing CLA, the information from the previous frame (NMpreClass(i) and CApreClass(i)) $(N M_{C l a s s (i)}^{p r e} and C A_{C l a s s (i)}^{p r e})$ is used. NMpreClass(i) $N M_{C l a s s (i)}^{p r e}$ provides a global view estimation of the MB class distribution for the current frame, and CApreClass(i) $C A_{C l a s s (i)}^{p r e}$ is used as a ratio parameter for class budget allocation for the additional layer.

3. The CIM method is implemented in the CLA process where (a) the CA for Class 2 is normally larger than other classes, and (b) Class 2 MBs have a larger guaranteed minimum number of SPs (i.e.,BLMB_Class(2) $B L_{M B_C l a s s (2)}$ in the tight-SLA).

4.3.2 MB-Level Computation Allocation

The MLA process can be described in Equation 4.6. Similar to the CLA process, a basic layer (BL_MB) and an additional layer (AL_MB) are set. When allocating the additional layer computation, the initial COST of the current MB(COSTinitcur_MB) $MB (C O S T_{c u r_M B}^{i n i t})$ is used as a parameter to decide the number of computations allocated. The MLA process for Class 2 or Class 3 MBs is described as in Figure 4.8:

Ccur_MB=BLcur_MB+ALcur_MB $C_{c u r_M B} = B L_{c u r_}_{M B} + A L_{c u r_M B}$

(4.6)

where

BLcur_MB=⎧⎩⎨⎪⎪⎪⎪BLMB_Class(1)BLCMB_Class(2)BLMB_Class(3)if classcur_MB=1if classcur_MB=2if classcur_MB=3 $B L_{c u r_M B} = {\begin{array}{l} B L_{M B_C l a s s (1)} & if c l a s s_{c u r_M B} = 1 \\ B L C_{M B_C l a s s (2)} & if c l a s s_{c u r_M B} = 2 \\ B L_{M B_C l a s s (3)} & if c l a s s_{c u r_M B} = 3 \end{array}$

FIGURE 4.8
The tight-MLA process for Class 2 and Class 3 MBs. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig.6. With permission.)

ALcur_MB=⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪0min(max(COSTinitcur_MBAvg_COSTinitClass(2)⋅abClass(2)nmpreClass(2),0),ALMB_max_Class(2))min(max(COSTinitcur_MBAvg_COSTinitClass(2)⋅abClass(3)nmpreClass(3),0),ALMB_max_Class(3))if Classcur_MB=1if Classcur_MB=2if Classcur_MB=3 $A L_{c u r_M B} = {\begin{array}{l} 0 & if C l a s s_{c u r_M B} = 1 \\ \min (\max (\frac{C O S T_{c u r_M B}^{i n i t}}{A v g_C O S T_{C l a s s (2)}^{i n i t}} \cdot \frac{a b_{C l a s s (2)}}{n m_{C l a s s (2)}^{p r e}}, 0), A L_{M B_\max_C l a s s (2)}) & if C l a s s_{c u r_M B} = 2 \\ \min (\max (\frac{C O S T_{c u r_M B}^{i n i t}}{A v g_C O S T_{C l a s s (2)}^{i n i t}} \cdot \frac{a b_{C l a s s (3)}}{n m_{C l a s s (3)}^{p r e}}, 0), A L_{M B_\max_C l a s s (3)}) & if C l a s s_{c u r_M B} = 3 \end{array}$

Ccur_MB $C_{c u r_M B}$ is the computation allocated to the current MB, COSTinitcur_MB $C O S T_{c u r_M B}^{i n i t}$ is the initial COST of the current MB as in Equation 4.2, Avg_COSTinitClass(i) $A v g_C O S T_{C l a s s (i)}^{i n i t}$ is the average of the initial COST for all the already-coded MBs belonging to Class i in the current frame, “abClass(i)" $“ a b_{C l a s s (i)} "$ is the computation budget available in the additional layer for class i before coding the current MB, and "nmpreClass(i)” $" n m_{C l a s s (i)}^{p r e} ”$ is the estimated number of remaining uncoded MBs for class i before coding the current MB. BLCMB_Class(2) $B L C_{M B_C l a s s (2)}$ is equal toBLMB_Class(2) $B L_{M B_C l a s s (2)}$ if either abClass(2)>0 or nmClass(2)>1 $a b_{C l a s s (2)} > 0 or n m_{C l a s s (2)} > 1$ , and equal to BLMB_Class(3) $B L_{M B_C l a s s (3)}$ otherwise. It should be noted that BLCMB_Class(2) $B L C_{M B_C l a s s (2)}$ is defined to follow the tight strategy where a larger ML-BL budget (BLMB_Class(2)) $(B L_{M B_C l a s s (2)})$ is used if the available budget is sufficient and a smaller ML-BL budget (BLMB_Class(3)) $(B L_{M B_C l a s s (3)})$ is used otherwise. ALMB_max_Class(2) $A L_{M B_m a x_C l a s s (2)}$ and ALMB_max_Class(3) $A L_{M B_m a x_C l a s s (3)}$ are the same as in Equation 4.5 and are set in order to avoid too many useless SPs allocated to the current MB. In the experiments of this chapter, we set ALMB_max_Class(i)+BLMB_Class(i)=250 $A L_{M B_m a x_C l a s s (i)} + B L_{M B_C l a s s (i)} = 250$ for a search range of ±32 pixels. It should be noted that since we cannot get the exact number of remaining MBs for each class before coding the whole frame, nmpreClass(i) $n m_{C l a s s (i)}^{p r e}$ is estimated by the parameters of the previous frame. “abClass(i)” $“ a b_{C l a s s (i)} ”$ and “nmpreClass(i)” $“ n m_{C l a s s (i)}^{p r e} ”$ are set as ALClass(i) $A L_{C l a s s (i)}$ and NMpreClass(i) $N M_{C l a s s (i)}^{p r e}$ , respectively, at the beginning of each frame and are updated before coding the current MB as in

{abClass(i)=abClass(i)−(CAper_MB−BLper_MB)nmperClass(i)=nmperClass(i)−1if Classper_MB=iif Classper_MB=i, ${\begin{array}{l} a b_{C l a s s (i)} = a b_{C l a s s (i)} - (C A_{p e r_M B} - B L_{p e r_M B}) & if C l a s s_{p e r_M B} = i \\ n m_{C l a s s (i)}^{p e r} = n m_{C l a s s (i)}^{p e r} - 1 & if C l a s s_{p e r_M B} = i \end{array},$

(4.7)

where

the definition of NMpreClass(i) $N M_{C l a s s (i)}^{p r e}$ is the same as in Equation 4.5

CApre_MB $C A_{p r e_M B}$ and BLpre_MB $B L_{p r e_M B}$ represent the actual computation consumed and the basic layer computation allocated for the MB right before the current MB, respectively

From Equations 4.5 through 4.7, we can see that the CLA and MLA steps are based on classification using our CIM method, where Class 1 MBs are always allocated a fixed small number of SPs, and Class 2 and Class 3 MBs are first separated into independent class budgets and then allocated based on their init_COST value within each class budget. Thus, the proposed CCME algorithm can combine the class information and COST information for a more precise computation allocation.

4.3.3 Step-Level Computation Allocation

The SLA process will allocate the computation budget for an MB into each ME step. Since the SHS method is used to perform ME in this chapter, we will describe our SLA step based on the SHS algorithm. However, our SLA method can easily be applied to other ME algorithms (Li et al., 1994; Lin et al., 2008; Po and Ma, 1996; Zhou et al., 2004; Zhu and Ma, 2000).

The SLA process can be described as

⎧⎩⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪CSmall_Local_Search=CStep_minCCross_Search=NSCross_Search⋅CSCross_SearchCMulti_Hex_Search=NSMulti_Hex_Search⋅CSMulti_Hex_SearchCSmall_Hex_Search={Let it go0if (NSCross_Search+NSMulti_Hex_Search)>1if (NSCross_Search+NSMulti_Hex_Search)≤1CSmall_Diamond_Search={Let it go0if NSCross_Search>1if NSCross_Search≤1 ${\begin{array}{l} C_{S m a l l_L o c a l_S e a r c h} = C_{S t e p_m i n} \\ C_{C r o s s_S e a r c h} = N S_{C r o s s_S e a r c h} \cdot C S_{C r o s s_S e a r c h} \\ C_{M u l t i_H e x_S e a r c h} = N S_{M u l t i_H e x_S e a r c h} \cdot C S_{M u l t i_H e x_S e a r c h} \\ C_{S m a l l_H e x_S e a r c h} = {\begin{array}{l} Let it go & if (N S_{C r o s s_S e a r c h} + N S_{M u l t i_H e x_S e a r c h}) > 1 \\ 0 & if (N S_{C r o s s_S e a r c h} + N S_{M u l t i_H e x_S e a r c h}) \leq 1 \end{array} \\ C_{S m a l l_D i a m o n d_S e a r c h} = {\begin{array}{l} Let it go & if N S_{C r o s s_S e a r c h} > 1 \\ 0 & if N S_{C r o s s_S e a r c h} \leq 1 \end{array} \end{array}$

(4.8)

where CSmall_Local_Search, CCross_Search, CMulti_Hex_Search, CSmall_Hex_Search, and CSmall_Diamond_Search $C_{S m a l l_L o c a l_S e a r c h}, C_{C r o s s_S e a r c h}, C_{M u l t i_H e x_S e a r c h}, C_{S m a l l_H e x_S e a r c h}, and C_{S m a l l_D i a m o n d_S e a r c h}$ are the computations allocated to each ME step of the SHS algorithm. CStep_min $C_{S t e p_m i n}$ is the minimum guaranteed computation for the Small Local Search Step. In the case of the SHS method,CStep_min $C_{S t e p_m i n}$ is set to be 4. CSCross_Search $C S_{C r o s s_S e a r c h}$ and CSMulti_Hex_Search $C S_{M u l t i_H e x_S e a r c h}$ are the number of SPs in each substep of the Cross Search Step and the Multi Hexagon Search Step, respectively. For the SHS method, CSCross_Search $C S_{C r o s s_S e a r c h}$ andCSMulti_Hex_Search $C S_{M u l t i_H e x_S e a r c h}$ are equal to 4 and 16, respectively (Yi et al., 2005). “Let it go” in Equation 4.8 means performing the regular motion search step. NSCross_Search $N S_{C r o s s_S e a r c h}$ and NSMulti_Hex_Search $N S_{M u l t i_H e x_S e a r c h}$ are the number of substeps in the Cross Search Step and the Multi Hexagon Search Step, respectively. They are calculated as

⎧⎩⎨⎪⎪⎪⎪⎪⎪NSCross_Search=⌊RTCross_Search.(Ccur_MB−CStep_min)CSCross_Search⌋NSMulti_Hex_Search=⌊RTMulti_Hex_Search.(Ccur_MB−CStep_min)CSMulti_Hex_Search⌋ ${\begin{array}{l} N S_{C r o s s_S e a r c h} = ⌊ \frac{R T_{_{C r o s s_S e a r c h}} . (C_{c u r_M B} - C_{S t e p_\min})}{C S_{C r o s s_S e a r c h}} ⌋ \\ N S_{M u l t i_H e x_S e a r c h} = ⌊ \frac{R T_{_{M u l t i_H e x_S e a r c h}} . (C_{c u r_M B} - C_{S t e p_\min})}{C S_{M u l t i_H e x_S e a r c h}} ⌋ \end{array}$

(4.9)

whereCcur_MB $C_{c u r_M B}$ is the computation budget for the whole MB as in Equation 4.6. RTCross_Search $R T_{C r o s s_S e a r c h}$ and RTMulti_Hex_Search $R T_{M u l t i_H e x_S e a r c h}$ are the predefined ratios by which the MB’s budget Ccur_MB $C_{c u r_M B}$ is allocated to the Cross Search Step and the Multi Hexagon Search Step. In the case of SHS method, we set RTCross_Search $R T_{C r o s s_S e a r c h}$ to be 0.32 and RTMulti_Hex_Search $R T_{M u l t i_H e x_S e a r c h}$ to be 0. 64. This means that 32% of the MB’s budget will be allocated to the Cross Search Step and 64% of the MB’s budget will be allocated to the Cross Search Step. We use the floor function (⌊.⌋) $(⌊ . ⌋)$ in order to make sure that the integer substeps of SPs are allocated.

From Equation 4.8, we can see that the SLA process will first allocate the minimum guaranteed computation to the Small Local Search Step. Then most of the available computation budget will be allocated to the Cross Search Step (32%) and the Multi Hexagon Search Step (64%). If there is still enough computation left after these two steps, the regular Small Hexagon Search and Small Diamond Search will be performed to refine the final MV. If there is not enough budget for the current MB, some motion search steps such as the Small Hexagon Search and Small Diamond Search will be skipped. In the extreme case, for example, if the MB’s budget has only six SPs, then all the steps after the Small Local Search will be skipped, and the SLA process will end up with only performing a Small Local Search. It should be noted that since the SLA is proceeded before the ME process, the computation will be allocated to the Cross Search and the Multi Hexagon Search Steps, no matter whether these steps are skipped in the later ME process (i.e., skipped by th₂ in Figure 4.3).

4.4 Experimental Results

We implemented our proposed CCME algorithm on the H.264/MPEG-4 AVC reference software JM10.2 version (HHI, 2011). Motion search was based on SHS (Yi et al., 2005), where th₁ and th₂ in Figure 4.3 is set to be 1000 and 5000, respectively. For each of the sequences, 100 frames were coded, and the picture coding structure was IPPP…. It should be noted that the first P frame was coded by the original SHS method (Yi et al., 2005) to obtain initial information for each class. In the experiments, only the 16 × 16 partition was used with one reference frame coding for the P frames. The QP was set to be 28, and the search range was ±32 pixels.

4.4.1 Experimental Results for the CCME Algorithm

In this section, we show experimental results for our proposed CCME algorithm. We fix the target computation (or SP) budget for each frame. The results are shown in Table 4.3 and Figure 4.9.

Table 4.3 shows PSNR, Bit Rate, the average number of search points actually used per frame (Actual SP), and the average number of search points per MB (Actual SP/MB) for different sequences. The Budget columns in the table represent the target SP budget for performing ME, where 100% in the Scale column represents the original SHS (Yi et al., 2005). Since we fix the target SP budget for each frame, the values in the Scale column are measured in terms of the number of SPs per frame (e.g., 40% in the Scale column means the target SP budget for each frame is 40% of the average-SP-per-frame value of the original SHS [Yi et al., 2005]). Similarly, the values in the Budget SP column represent the corresponding number of SPs per frame for the budget scale levels indicated by the Scale column. Figure 4.9 shows the number of SPs used for each frame as well as the target SP budgets for each frame under 60% budget levels for Football_Cif. Similar results can be found for other sequences.

TABLE 4.3

Experimental Results for the Tight Strategy When Fixing the Target Budget for Each Frame

Source: From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Table 3, With permission.

Note: The Budget SP and the Actual SP columns are measured in terms of the number of SPs per frame.

FIGURE 4.9
The number of SPs used for each frame vs. the target frame-level budgets for the tight strategy for the tight strategy for Football_Cif. (From Lin et al., A computation control motion estimate method forcomplexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 7. With permission.)

Comparing the Actual SP column with the Budget SP column in Table 4.3, we can see that the number of SPs actually used is always smaller than the target SP budget for all target budget levels. This demonstrates that our CCME algorithm can efficiently perform computation allocation to meet the requirements of different target computation budgets. From Table 4.3, we can also see that our CCME algorithm has good performance even when the available budget is low (40% for Football and Mobile). This demonstrates the allocation efficiency of our algorithm. Furthermore, from Figure 4.9, we can see that since the CCME algorithm is based on the tight strategy, which does not allow computation borrowing from other frames, the number of SPs used in each frame is always smaller than the target frame-level budget. Thus, the average SPs per frame for the tight strategy is always guaranteed to be smaller than the target budget.

4.4.2 Comparison with Other Methods

In the previous sections, we have shown experimental results for our proposed CCME algorithm. In this section, we will compare our CCME methods with other methods.

Similar to the previous section, we fixed the target computation budget for each frame to prevent the effect of FLA. The following three methods are compared. It should be noted that all these three methods use our SLA method for a fair comparison:

TABLE 4.4

Performance Comparison for CCME Algorithms

Source: From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Table 4, With permission.

1. Perform the proposed CCME algorithm with the tight strategy (Proposed in Table 4.4).

2. Do not classify the MBs into classes and allocate computation only based on their Init_COST (Chen et al., 2006; Huang et al., 2005) (COST only in Table 4.4).

3. First search the (0,0) points of all the MBs in the frame, and then allocate SPs based on (0,0) SAD. This method is the variation of the strategy for many multi-pass methods (Tai et al., 2003; Yang et al., 2005) ((0,0) SAD in Table 4.4).

Table 4.4 compares PSNR (in dB), bit rate (BR, in kbps), and the average number of search points per MB (SPs). The definition of the Budget Scale column of the table is the same as in Table 4.3. Figure 4.10 shows the BR Increase vs. Budget Level for these methods where the BR Increase is defined by the ratio between the current bit rate and its corresponding 100% Level bit rate.

From Table 4.4 and Figure 4.10, we can see that our proposed CCME method can allocate SPs more efficiently than the other methods at different computation budget levels. This demonstrates that our proposed method, which combines the class and the COST information of the MB, can provide a more accurate way to allocate SPs.

FIGURE 4.10
Performance comparison for different CCME algorithms. (a) Bus_Cif, (b) Mobile_Cif, (c) Stefan_Cif, (d) Dancer_Cif, (e) Foreman_Cif, (f) Football_Cif. (From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Fig. 8. With permission.)

For a further analysis of the result, we can compare the bit-rate performance of the Mobile sequence (i.e., Figure 4.10b) with its MB classification result (i.e., Figures 4.5b through d). When the budget level is low, our proposed algorithm can efficiently extract and allocate more SPs to the more important Class 2 MBs (Figure 4.5c) while reducing the unnecessary SPs from Class 3 (Figure 4.5d). This keeps the performance of our method as high as possible. Furthermore, since the number of extracted Class 2 MBs is low (Figure 4.5c), our proposed algorithm can still keep high performance at very low budget levels (e.g., 5% budget level in Figure 4.10b). Compared to our method, the performances of the other methods will significantly decrease when the budget level becomes low.

However, the results in Table 4.4 and Figure 4.10 also show that for some sequences (e.g., Foreman and Football), the advantages of our CCME algorithm are not so obvious from the other methods. This is because

1. For some sequences such as Football, the portion of Class 2 MBs is large. In this case, the advantages of our CCME method from MB classification become less obvious. In extreme cases, if all MBs are classified into Class 2, our proposed CCME algorithm will be the same as the COST only algorithm.

2. For some sequences such as Foreman, the performance will not decrease much even when very few points are searched for each MB (e.g., our experiments show that the performance for Foreman_Cif will not decrease much even if we only search six points for each MB). In this case, different computation allocation strategies will not make much difference.

Table 4.5 shows the results for sequences with different resolutions (Mobile_Qcif and Mobile_SD) or using different QPs (Bus with QP = 23 or 33). Table 4.5 shows the efficiency of our algorithm under different resolutions and different QPs. Furthermore, we can also see from Table 4.5 that the performance of our algorithm is very close to the other methods for Mobile_Qcif. The reason is similar to the case of Foreman_Cif (i.e., a local search for each MB can still get good performance and thus different computation allocation strategies will not make much difference).

TABLE 4.5

Experimental Results for Sequences with Different Resolutions or Using Different QPs

Source: From Lin et al., A computation control motion estimate method for complexity-scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11), 1533, Table 5. With permission.

From these discussions, we can summarize the advantages of our proposed CCME algorithm as follows:

1. The proposed algorithm uses a more suitable way to measure MB importance by differentiating MBs into different classes. When the available budget is small, the proposed method can save unnecessary SPs from Class 3 MBs so that more SPs can be allocated to the more important Class 2 MBs, which keeps the performance as high as possible. When the available target budget is large, the method will have more spare SPs for Class 3 MBs, which can overcome the possible performance decrease from MB misclassification and further improve the coding performance.

2. The proposed algorithm can reduce the impact of not having a global view of the whole frame for one-pass methods by (a) setting the basic and the additional layers, (b) using previous frame information as the global view estimation, (c) guaranteeing Class 2 MBs a higher minimum SPs, and (d) using three independent class budgets so that an unsuitable allocation in one class will not affect other classes.

Furthermore, we also believe that the framework of our CCME algorithm is general and can easily be extended. Some possible extensions of our algorithm can be described as follows:

1. As mentioned, other FLA or SLA methods (Chen et al., 2006; He et al., 2005; Huang et al., 2005; Kim et al., 2006; Tai et al., 2003; Yang et al., 2005) can easily be implemented into our CCME algorithm. For example, in some time-varying motion sequences, an FLA algorithm may be very useful to allocate more computation to those high-motion frames and further improve the performance.

2. In this chapter, we only perform experiments on the 16 × 16 partition size and the IPPP… picture type. Our algorithm can easily be extended to ME with multiple partition sizes and multiple reference frames such as in H.264/AVC (Wiegand et al., 2003) as well as other picture types.

3. In this chapter, we define three MB classes and perform CCME based on these three classes. Our method can also be extended by defining more MB classes and developing different CLA and MLA steps for different classes.

4.5 Extending Our MB Classification Method for Other Video-Processing Applications

In this section, we discuss using our proposed MB classification method for other video processing applications. From Equation 4.4 and Figure 4.5, we can further draw the following observations: From Figures 4.5b and f, we can see that most Class 1 MBs include backgrounds or flat areas that can find good matches in the previous frames. From Figure 4.5c and g, we can see that our classification method can effectively detect irregular areas and classify them into Class 2 (e.g., the edge between the calendar and the background as well as the bottom circling ball in (c), and the running bus as well as the down-right logo in (g)). From Figures 4.5d and h, we can see that most complex-texture areas are classified as Class 3, such as the complex background and calendar in (d) as well as the flower area in (h).

Therefore, based on the proposed MB class information in Equation 4.4, we can also develop various algorithms for different applications. Note that since our proposed method is directly defined based on the information readily available from the ME process or from the compressed video bitstream, it is with low computational complexity and is applicable to various video applications, especially for those with low-delay and low-cost requirements. In the following, we will propose algorithms for the three example applications: shot change detection, motion discontinuity (MD) detection, and outlier rejection for GME.

4.5.1 Shot Change Detection

In this chapter, we define a “shot” as a segment of continuous video frames captured by one camera action (i.e., a continuous operation of one camera), and a “shot change” as the boundary of two shots. Figure 4.11 shows an example of an abrupt shot change.

From the previous discussions, we can outline the ideas of applying our approach to shot change detection as follows: Since shot changes (including abrupt, gradual, fade-in or fade-out) (Lin et al., 2010) always happen between two uncorrelated video shots, the content correlation between frames at shot changes will be low. Therefore, we can use the information of Class 1 as the primary feature to detect shot changes. Furthermore, since the motion pattern will also change at shot changes, the information of Class 2 and Class 3 can be used as additional features for shot change detection.

FIGURE 4.11
An example of an abrupt shot change.

FIGURE 4.12
The MB distributions at the abrupt shot change frame from Bus_Cif to Mobile_Cif: (a) Original frame, (b) Class 1, (c) Class 2, (d) Class 3.

Figure 4.12 shows an example of the effectiveness in using our class information for shot change detection. More results will be shown in the experimental results. Figures 4.12b through d show the MB distributions of three classes at the abrupt shot change from Bus_Cif to Mobile_Cif. We can see that the information of Class 1 can effectively indicate the low content correlation between frames at the shot change (i.e., no MB is classified as Class 1 in Figure 4.12b). Furthermore, a large number of MBs are classified as Class 2. This indicates the rapid motion pattern change at the shot change.

Based on these discussions, we can propose a Class-Based Shot Change detection (CB-Shot) algorithm. It is described as in Equation 4.10:

Fgshot(t)={10if NClass_1(t)≤T1 and NIntra_MB(t)−NIR(t)≥T4 or if {NClass_1(t)≤T2 and NIntra_MB(t)−NIR(t)≥T4 and |NClass_2(t)−NClass_2(t−1)|+|NClass_3(t)−NClass_3(t−1)|≥T3else $F g_{s h o t} (t) = {\begin{array}{l} 1 & if N_{C l a s s_1} (t) \leq T_{1} and N_{I n t r a_M B} (t) - N_{I R} (t) \geq T_{4} or if {\begin{array}{l} N_{C l a s s_1} (t) \leq T_{2} and N_{I n t r a_M B} (t) - N_{I R} (t) \geq T_{4} and | N_{C l a s s_2} (t) - N_{C l a s s_2} (t - 1) | + | N_{C l a s s_3} (t) - N_{C l a s s_3} (t - 1) | \end{array} \geq T_{3} \\ 0 & else \end{array}$

(4.10)

where

t is the frame number

Fg_shot(t) is a flag indicating whether a shot change happens at the current frame t or not

Fg_shot(t) will be equal to 1 if there is a shot change and will be equal to 0 else. NIntra_MB(t) $N_{I n t r a_M B} (t)$ is the number of intra-coded MBs at frame t;NIR(t) $t; N_{I R} (t)$ is the number of intra-refresh MBs in the current frame (i.e., forced intra-coding MBs [HHI, 2011]); NClass_1(t),NClass_2(t) $N_{C l a s s_1} (t), N_{C l a s s_2} (t)$ , and NClass_3(t) $N_{C l a s s_3} (t)$ are the total number of Class 1, Class 2, and Class 3 MBs in the current frame t, respectively. T₁, T₂, T₃, and T₄ are the thresholds for deciding the shot change. In this chapter, T₁ through T₄ are calculated by Equation 4.11:

T1=(NMB(t)−NIR(t))40T2=(NMB(t)−NIR(t))30T3=(NMB(t)−NIR(t))4T4=T1 $\begin{array}{l} T_{1} = \frac{(N_{M B} (t) - N_{I R} (t))}{40} \\ T_{2} = \frac{(N_{M B} (t) - N_{I R} (t))}{30} \\ T_{3} = \frac{(N_{M B} (t) - N_{I R} (t))}{4} \\ T_{4} = T_{1} \end{array}$

(4.11)

where N_MB(t) is the total number of MBs of all classes in the current frame.

It should be noted that in Equation 4.10, the Class 1 information is the main feature for detecting shot changes (i.e., NClass_1(t)≤T1 $N_{C l a s s_1} (t) \leq T_{1}$ and NClass_1(t)≤T2 $N_{C l a s s_1} (t) \leq T_{2}$ in Equation 4.10). The intuitive of using the Class 1 information as the major feature is that it is a good indicator of the content correlation between frames. The Class 2 and Class 3 information is used to help detect frames at the beginning of some gradual shot changes where a large change in motion pattern has been detected but the number of Class 1 MBs has not yet decreased to a small number. The intra-coded MB information can help discard the possible false alarm shot changes due to the MB misclassification. From Equations 4.10 and 4.11, we can also see that when intra-refresh functionality is enabled (i.e., when NIR(t)>0 $N_{I R} (t) > 0$ ), our algorithm can be extended by simply excluding these intra-refreshed MBs and only performing shot change detection based on the remaining MBs.

Furthermore, note that Equation 4.10 is only one implementation of using our class information for shot change detection. We can easily extend Equation 4.10 by using more sophisticated methods such as cross-validation (Lin et al., 2008) to decide the threshold values in an automatic way. Furthermore, some machine learning models can also be used to decide the shot detection rules and take the place of the rules in Equation 4.10. For example, we can train a support vector machine (SVM) or a Hidden Markov Model (HMM) based on our class information for detecting shot changes (Liu et al., 2004; SVMLight, 2011). By this way, we can avoid the tediousness of manually tuning multiple thresholds simultaneously. This point will be further discussed in the experimental results.

4.5.2 Motion Discontinuity Detection

We define motion discontinuity as the boundary between two smooth camera motions (SCMs), where SCM is a segment of continuous video frames captured by one single motion of the camera (such as zooming, panning, or tilting) (Lin et al., 2008; Shu and Chau, 2005). For example, in Figure 4.13 the first several frames are captured when the camera has no or little motion. Therefore, they form the first SCM (SCM1). The second several frames form another SCM (SCM2) because they are captured by a single camera motion of rapid rightward. Then, an MD can be defined between these two SCMs. It should be noted that the major difference between shots and SCMs is that a shot is normally composed of multiple SCMs.

Basically, motion discontinuity can be viewed as motion unsmoothness or the change of motion patterns. The detection of motion discontinuity can be very useful in video content analysis or video coding performance improvement (Boyce, 2004; Lee and Lee, 2001). Since our class information, especially Class 2 information, can efficiently reflect the irregular motion patterns, it can be easily used for MD detection.

FIGURE 4.13
An example of motion discontinuity.

FIGURE 4.14
The MB distributions at a motion discontinuity frame in Foreman_Cif: (a) Original frame, (b) Class 1, (c) Class 2, (d) Class 3.

The ideas of applying our MB class information into MD detection can be outlined as follows: Since MD happens between different camera motions, the motion smoothness will be disrupted at the places of MDs. Therefore, we can use the Class 2 information as the primary feature to detect MDs. Furthermore, since frames at MDs belong to the same camera action (i.e., the same shot), their content correlation will not decrease. Therefore, the information of Class 1 can be used to differentiate shot changes from MDs.

Figure 4.14 shows an example of the effectiveness in using our class information in MD detection. Figure 4.14b through d shows the MB distributions of a MD frame in Foreman_Cif when the camera starts to move rightward rapidly. The large number of Class 2 MBs indicates the motion unsmoothness due to the MD. Furthermore, the big number of Class 1 MBs indicates the high content correlation between frames, which implies that it is not a shot change.

Therefore, we can propose a class-based MD detection (CB-MD) algorithm. It is described as

FgMD(t)=⎧⎩⎨⎪⎪⎪⎪1 if NClass_1(t)≥thMD1 and0∑i=0kI(NClass_2(t−i)≥thMD3)=k+1else $F g_{M D} (t) = {\begin{array}{l} 1 if N_{C l a s s_1} (t) \geq t h_{1}^{M D} and & \sum_{i = 0}^{k} I (N_{C l a s s_2} (t - i) \geq t h_{3}^{M D}) = k + 1 \\ 0 & else \end{array}$

(4.12)

where I(f) is an indicator. I will equal to 1 if f is true, and 0 if f is false. Equation 4.12 means that an MD will be detected only if the number of Class 2 MBs is larger than a threshold for k + 1 consecutive frames. This is based on the assumption that an obvious camera motion change will affect several frames rather than one. By including the information of several frames, the false alarm rate can be reduced. Furthermore, similar to shot change detection, the decision rules in Equation 4.12 can also be extended to avoid the manual setting of thresholds.

4.5.3 Outlier Rejection for Global Motion Estimation

Global motion estimation is another useful application of our class information. Since a video frame may often contain various objects with different motion patterns and directions, motion segmentation is needed to filter out these motion regions before estimating the global motion parameters of the background. Since our class information can efficiently describe the motion patterns of different MBs, it is very useful in filtering out the irregular motion areas (outliers). For example, we can simply filter out Class 2 or Class 2 + Class 3 MBs and perform GME based on the remaining MBs.

Based on the MB class information, the proposed GME algorithm can be described as follows:

Step 1: Use our class information to get a segmentation of the irregular motion MBs, as in

Foreground={all Class 2 and Class3 MBsClass MBsif(NClass_2(t)+NClass_3(t))<ThFelse $F o r e g r o u n d = {\begin{array}{l} all Class 2 and Class3 MBs & if (N_{C l a s s_2} (t) + N_{C l a s s_3} (t)) < T h_{F} \\ Class MBs & else \end{array}$

(4.13)

where

NClass_2(t) $N_{C l a s s_2} (t)$ and NClass_3(t) $N_{C l a s s_3} (t)$ are the number of Class 2 and Class 3 MBs in t

Th_F is a threshold

Step 2: Estimate the global motion parameters based on the remaining background MVs. In this chapter, we use the six-parameter model as the global motion model, as described in

⎛⎝⎜x′y′1⎞⎠⎟=GMV6p(x,y;a,b,c,d,e,f)=S(a,b,c,d,e,f)×⎛⎝⎜xy1⎞⎠⎟ $(\begin{matrix} x^{'} \\ y^{'} \\ 1 \end{matrix}) = G M V^{6 p} (x, y; a, b, c, d, e, f) = S (a, b, c, d, e, f) \times (\begin{matrix} x \\ y \\ 1 \end{matrix})$

(4.14)

where

S(a,b,c,d,e,f)=⎛⎝⎜ad0be0cf1⎞⎠⎟ $S (a, b, c, d, e, f) = (\begin{array}{l} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{array})$ is the six-parameter model

(x,y) and (x′,y′) represent the pixel’s original and global motion-compensated location, respectively

There are many ways to estimate S. In this chapter, we use the Least-Square method (Soldatov et al., 2006), which searches parameters in S that minimizes a given cost function (mean-square error), as in

S(a,b,c,d,e,f)=arg minS(a,b,c,d,e,f)(∑x,y∥∥V(x,y)−GMV6p(x,y;a,b,c,d,e,f)∥∥2) $S (a, b, c, d, e, f) = \underset{S (a, b, c, d, e, f)}{\arg \min} (\sum_{x, y} {‖ V (x, y) - G M V^{6 p} (x, y; a, b, c, d, e, f) ‖}^{2})$

(4.15)

where V(x,y)=(VxVy1)T $V (x, y) = {(V^{x} V^{y} 1)}^{T}$ and (Vx,Vy) $(V^{x}, V^{y})$ are the MV-terminate coordinates for pixel (x,y).

Figure 4.15 shows some results of using our class information for motion region segmentation. From Figures 4.15a and b, we can see that our class information can efficiently locate the foreground motion regions. From Figure 4.15c, we can also see that our algorithm focuses more on detecting the “motion regions” instead of the foreground object. In Figure 4.15c, since only the person’s left hand is moving while the other parts of the person keep unchanged, only those blocks corresponding to the left hand are identified as motion regions. However, it is noted that our class information can also be extended to detect real foreground objects by combining with texture information such as DC and AC coefficients (Porikli et al., 2010).

4.5.4 Experimental Results

In this section, we perform experiments for the proposed methods in Sections 4.5.1 through 4.5.3. The algorithms are implemented on the H.264/MPEG-4 AVC reference software JM10.2 version (HHI, 2011). The picture resolutions for the sequences are CIF and SIF. For each of the sequences, the picture coding structure was IPPP…. In the experiments, only the 16 × 16 partition was used with one reference frame coding for the P frames. The QP was set to be 28, the search range was ±32 pixels, and the frame rate was 30 frames/s. The motion estimation is based on our proposed Class-based Fast Termination method (Lin et al., 2009). Note that our MB classification method is general regardless of the ME algorithms used. It can easily be extended to other ME algorithms (Li et al., 1994; Po and Ma, 1996). Furthermore, we disable the intra-refresh functionality (HHI, 2011) in the experiments in this chapter in order to focus on our class information. However, from our experiments, the shot detection results will not differ by much when intra-refresh is enabled.

FIGURE 4.15
Examples of using our class information for motion region segmentation for global motion estimation ((A) original frames; (B) segmented frames): (a) Original frame for Dancer_cif, (b) Segmented frame for Dancer_cif in (a), (c) Original frame for Stefan_cif, (d) Segmented frame for Stefan_cif in (c), (e) Original frame for Silent_cif, (f) Segmented frame for Silent_cif in (e).

4.5.4.1 Experimental Results for Shot Change Detection

We first perform experiments for shot change detection. Four shot change detection algorithms are compared:

1. Detect shot changes based on the number of Intra-MBs (Eom and Choe, 2007; Zhang and Kittler, 1999) (Intra-based in Table 4.6). A shot change will be detected if the number of Intra-MBs in the current frame is larger than a threshold.

2. Detect shot changes based on motion smoothness (Akutsu et al., 1992; Shu and Chau, 2005) (MV-Smooth-based in Table 4.6). The motion smoothness can be calculated by the Square of Motion Change (Shu and Chau, 2005), as in

SMC(t)=∑i∈current_frame((MVix(t)−MVix(t−1))2+(MViy(t)−MViy(t−1))2) $S M C (t) = \sum_{i \in c u r r e n t_f r a m e} ({(M V_{x}^{i} (t) - M V_{x}^{i} (t - 1))}^{2} + {(M V_{y}^{i} (t) - M V_{y}^{i} (t - 1))}^{2})$

(4.16)

where

SMC(t) is the value of the Square of Motion Change(SMC) at frame t

MVix(t) $M V_{x}^{i} (t)$ and MViy(t) $M V_{y}^{i} (t)$ are the x and y components of the motion vector for Macroblock i of frame t, respectively

From Equation 4.16, we can see that SMC is just the “sum of squared motion vector difference” between colocated MBs of neighboring frames. Based on Equation 4.16, a shot change can be detected if SMC(t) is larger than a threshold at frame t.

TABLE 4.6

Performance Comparison of Different Algorithms in Detecting the Shot Changes in the Extended TRACVID Dataset

	Miss (%)	False Alarm (%)	TEFR
Intra-based	25.24	4.27	13.50
MV-Smooth-based	43.72	17.36	22.75
Intra + MV-Smooth	24.71	3.49	12.58
Proposed-Class 1 only	8.34	3.81	5.51
Proposed-All Class + Intra	6.13	2.91	3.23

3. Detect shot changes based on the combined information of Intra-MB and motion smoothness. (Shu and Chau, 2005) (Intra + MV-Smooth in Table 4.6). In this method, the Intra-MB information is included into the SMC, as in

SMCIntra_included(t)=∑i∈current_frameMC(i) $S M C_{I n t r a_i n c l u d e d} (t) = \sum_{i \in c u r r e n t_f r a m e} M C (i)$

(4.17)

where SMCIntra_included(t) $S M C_{I n t r a_i n c l u d e d} (t)$ is the SMC with Intra-MB information included. MC(i) is defined as in

MC(i)={(MVix(t)−MVix(t−1))2+(MViy(t)−MViy(t−1))2 Lif i inter-codedif i intra-coded $M C (i) = {\begin{array}{l} {(M V_{x}^{i} (t) - M V_{x}^{i} (t - 1))}^{2} + {(M V_{y}^{i} (t) - M V_{y}^{i} (t - 1))}^{2} & if i inter-coded \\ L & if i intra-coded \end{array}$

(4.18)

where

i is the MB number

L is a large fixed number

In the experiment of this chapter, we set L to be 500. From Equations 4.17 and 4.18, we can see that the Intra + MV-Smooth method is similar to the MV-Smooth-based method except that when MB i is intra-coded, a large value L will be used instead of the squared motion vector difference. It should be noted that when the number of intra-MBs is low, the Intra + MV-Smooth method will be close to the MV-Smooth-based method. If the number of intra-MBs is high, the Intra + MV-Smooth method will be close to the Intra-based method.

4. The proposed Class-based shot change detection algorithm, which uses the Class 1 information as the major feature for detection, as in Equation 4.10 (Proposed All Class + Intra in Table 4.6).

It should be noted that we choose methods (I) through (III) as the reference algorithms to compare with our methods because they are all computationally efficient methods (with the average operation time less than 5 ms). Thus, they are suitable for the application of shot change detection for video coding.

Figure 4.16 shows the curves of features that are used in the aforementioned algorithms. Since all the algorithms perform well in detecting abrupt shot changes, we only show the curves of a gradual shot change in Figure 4.16.

Figures 4.16a through e are the feature curves of a gradual shot change sequence as in Figure 4.17a. In this sequence, the first five frames are Bus_Cif, the last five frames are Football_Cif, and the middle 20 frames are the period of the gradual shot change. Figure 4.16a is the ground truth for the shot change sequence; Figure 4.16b shows the curve of the number of Intra-MBs in each frame; Figure 4.16c shows the curve of SMC(t); Figure 4.16d shows the curve of SMC_Intra__included(t); and Figure 4.16e shows the curve of the number of Class 1 MBs in each frame. It should be noted that we reverse the y-axis of Figure 4.16e so that the curve has the same concave shape as others.

FIGURE 4.16
Feature curves of a gradual shot change sequence. (a) The ground-truth shot change frames, (b) the curve of the number of Intra-MBs in each frame, (c) the curve of SMC(t), (d) the curve of SMCIntra_included(t) $S M C_{I n t r a_i n c l u d e d} (t)$ , and (e) the curve of the number of Class 1 MBs in each frame.

FIGURE 4.17
Example sequences in the extended TRACVID dataset. (a) An example sequence that we created. (b) The example sequence from TRECVID dataset (NIST, 2001).

Figure 4.16 shows the effectiveness of using our class information for shot change detection. From Figure 4.16e, we can see that the number of Class 1 MBs suddenly decreases to 0 when a shot change happens and then quickly increases to a large number right after the shot change period. Therefore, our proposed algorithms can effectively detect the gradual shot changes based on the Class 1 information. Compared to our class information, the method based on the Intra-MB number, SMC(t), and SMC_{Intra_included}(t) has low effectiveness in detecting the gradual shot changes. We can see from Figures 4.16b through d that the Intra-MB number, SMC(t), and SMC_{Intra_included}(t) have similar values for frames inside and outside the shot change period. This makes them very difficult to differentiate the gradual shot change frames. Figure 4.16c shows that SMC(t) is the least effective. This implies that only using motion smoothness information cannot work well in detecting shot changes. Our experiments show that the effectiveness of SMC(t) will be further reduced when both of the sub-sequences before and after the shot change have similar patterns or low motions. In these cases, the motion unsmoothness will not be so obvious at the shot change.

Table 4.6 compares the Miss rate, the False Alarm rate, and the total error frame rate (TEFR) (Lin et al., 2008) for different algorithms in detecting the shot changes in an extended TRECVID dataset. The extended TRECVID dataset has totally 60 sequences, which include both the sequences from the public TRECVID dataset (NIST, 2001) and the sequences that we create. There are totally 16 abrupt shot change sequences and 62 gradual shot change sequences with different types (gradual transfer, fade-in, and fade-out) and with different lengths of shot-changing period (e.g., 10 frames, 20 frames, and 30 frames). The example sequences of the dataset are shown in Figure 4.17. The Miss rate is defined by NkFA/Nk− $N_{F A}^{k} / N_{-}^{k}$ , where NkFA $N_{F A}^{k}$ is the total number of misdetected shot change frames in sequence k and Nk− $N_{-}^{k}$ is the total number of shot change frames in sequence k. The False Alarm rate is defined by NkFA/Nk− $N_{F A}^{k} / N_{-}^{k}$ , where NkFA $N_{F A}^{k}$ is the total number of false alarmed frames in sequence k and Nk− $N_{-}^{k}$ is the total number of non-shot change frames in sequence k. We calculate the Miss rate and the False Alarm rate for each sequence and average the rates. The TEFR rate is defined by Nt_miss/Nt−f $N_{t_m i s s} / N_{t - f}$ , where Nt_miss $N_{t_m i s s}$ is the total number of misdetected shot change frames for all sequences and Nt−f $N_{t - f}$ is the total number of frames in the dataset. The TEFR rate reflects the overall performance of the algorithms in detecting all sequences.

In order to have a fair comparison, we also list the results of only using Class 1 information for detection (i.e., detect a shot change frame if Nclass_1(t)<T1 $N_{c l a s s_1} (t) < T_{1}$ , Proposed-Class 1 only in Table 4.6). In the experiments of Table 4.6, the thresholds for detecting shot changes in Method (1) (Intra-based), Method (2) (MV-Smooth-based), and Method (3) (Intra + MV_Smooth) are set to be 200, 2,000, and 105,000, respectively. These thresholds are selected based on the experimental statistics.

From Table 4.6, we can see that the performances of our proposed algorithms (Proposed-Class 1 only and Proposed-All Class + Intra) are better than the other methods.

Furthermore, several other observations can be drawn from Table 4.6 as follows:

1. Basically, our Class 1 information, the Intra-MB information (Eom and Choe, 2007; Zhang and Kittler, 1999), and the residue information (Arman et al., 1994) can all be viewed as the features to measure the content correlation between frames. However, from Table 4.6, we can see that the performance of our Proposed-Class 1 only method is obviously better than the Intra-based method. This is because the Class 1 information includes both the residue information and the motion information. Only those MBs with both regular motion patterns (i.e., MV close to PMV or (0,0) MV) and low-matching-cost values are classified as Class 1. We believe that these MBs can reflect more efficiently the nature of the content correlation between frames. In our experiment, we found that there are large portions of MBs in the gradual shot change frames where neither intra-nor inter-prediction can perform well. The inter/intra mode selections for these MBs are quite random, which affect the performance of the Intra-based method. Compared to the Intra-based method, our algorithm can work well by simply classifying these MBs outside Class 1 and discarding them from the shot change detection process.

2. The performance of the Proposed-All Class + Intra method can further improve the performance from the Proposed-Class 1 only method. This implies that including Class 2 and Class 3 can help detect those frames that cannot be easily differentiated by only using the Class 1 information at the boundary of the shot change period. Furthermore, the reduced FA rate of the Proposed-All Class + Intra method also implies that including the intra-coded MB information can help discard false alarm frames due to MB misclassification.

4.5.4.2 Experimental Results for Motion Discontinuity Detection

In this section, we perform experiments for MD detection. The following four methods are compared. Methods (1) through (3) are the same as in the previous section:

1. Detect MD based on the number of Intra-MBs (Intra-based).

2. Detect MD based on motion smoothness (MV-Smooth-based).

3. Detect MD based on the combined information of Intra-MB and motion smoothness (Intra + MV-Smooth).

4. Our proposed MD detection algorithm as in Equation 4.12 (Proposed).

FIGURE 4.18
Feature curves for the MD detection in Stefan_Sif. (a) The ground-truth shot change frames, (b) the curve of the number of Intra-MBs in each frame, (c) the curve of SMC(t), (d) the curve of SMCIntra_included(t) $S M C_{I n t r a_i n c l u d e d} (t)$ , and (e) the curve of the number of Class 2 MBs in each frame.

Figure 4.18 shows the curves of features that are used in the aforementioned algorithms for Stefan_Sif sequence. Figure 4.18a shows the ground-truth segment of SCMs. In Figure 4.18a, the segments valued 0 represent SCMs with low or no camera motion and the segments with value 1 represent SCMs with high or active camera motion. For example, the segment between frame 177 and 199 represents an SCM where there is a rapid right-ward of the camera; and the segment between frame 286 and 300 represents an SCM of a quick zoom-in of the camera. The frames between SCMs are the MD frames that we want to detect. The ground-truth MD frames are labeled as the vertical dashed lines in Figures 4.18b through e. It should be noted that most MDs in Figure 4.18 include several frames instead of only one. Figure 4.18b through e show the curves of the number of Intra-MBs, SMC(t), SMCIntra_included(t) $S M C_{I n t r a_i n c l u d e d} (t)$ , and the number of Class 2 MBs, respectively.

Several observations can be drawn from Figures 4.18b through e as follows:

1. Our Class 2 information is more effective in detecting the MDs. For example, in Figure 4.18e, we can see that our Class 2 information has strong response when the first three MDs happen. Comparatively, the other features in Figures 4.18b through d have low or no response. This implies that Methods (I) through (III) will easily miss these MDs.

2. Our Class 2 information has quicker and sharper response to MDs. For example, the value of our Class 2 information increases quickly at the places of the fourth (around frame 175) and sixth (around frame 220) MDs, while the other features respond much slower or more gradual.

3. Figure 4.18 also demonstrates that our Class 2 information is a better measure of the motion unsmoothness. Actually, the largest camera motion in Stefan_Sif takes place in the segment between frame 222 and frame 278. However, we can see from Figure 4.18e that the values of the Class 2 information are not the largest in this period. This is because although the camera motion is large, the motion pattern is pretty smooth during the period. Therefore, a large number of MBs will have regular and predictable motions and will not be classified as Class 2. In most cases, our Class 2 information will have the largest responses when the motion pattern changes or the motion smoothness disrupts. Compared to our Class 2 information, other features are more sensitive to the “motion strength” rather than the “motion unsmoothness.” Furthermore, although SMC can also be viewed as a measure of the motion smoothness, we can see from Figure 4.18 that our Class 2 information is obviously a better measure for motion unsmoothness.

Figure 4.19b shows the MD detection result of the proposed method based on the Class 2 information in Figure 4.18e, where k, thshot1 $t h_{1}^{s h o t}$ , and thshot3 $t h_{3}^{s h o t}$ in Equation 4.12 are set to be 4, 50, and 100, respectively. From Figure 4.19, we can see that:

1. The proposed method can detect most MDs except the one at frame 200. The frame-200 MD is missed because we use a large window size of five frames (i.e., k = 4 in Equation 4.12). This MD can also be detected if we select a smaller window size.

2. Since the proposed method detects MDs based on the information of several frames, some delay may be introduced. We can see that the MDs detected in Figure 4.19b have a delay of a couple of frames from the ground truth in Figure 4.19a.

FIGURE 4.19
The detection result of the proposed algorithm in Stefan_Sif. (a) The ground-truth MD frames and (b) the MD detection results by our proposed method.

3. There are also some false alarms such as the period between frames 180 and 190. This is because the camera motions in these periods are too rapid. In these cases, the motion prediction accuracy will be decreased and some irregular global motions will be included. These factors will prevent the number of Class 2 MBs from decreasing after the MD finishes. In these cases, some postprocessing steps may be needed to discard these false alarms.

4.5.4.3 Experimental Results for Global Motion Estimation

We compare the following four GME algorithms. For all of the methods, we use the same six-parameter model for estimating the global motions, as in Equation 4.14:

1. Do not discard the foreground MBs and directly use the Least-Square method (Soldatov et al., 2006) to estimate the global model parameters (LS-6).

2. Use the MPEG-4 VM GME method (Sikora, 1997) (MPEG-4).

3. Use the method of Soldatov et al. (2006) for GME. In the work of Soldatov et al. (2006), an MV histogram is constructed for parameter estimation to speed up the GME process (MSU).

4. Use the method of Porikli et al. (2010) to segment and discard foreground MBs and perform GME on the background MBs (P-Seg).

5. Use our MB class information to segment and discard foreground MBs and perform GME on the background MBs, as described in Section 4.3.3 (Proposed).

Table 4.7 compares the mean square error (MSE) of the global motion-compensated results of the five algorithms. Normally, a small MSE value can be expected if the global motion parameter is precisely estimated. Table 4.8 compares the average MSE and the average operation time for different methods. Furthermore, Figure 4.20 also shows the subjective global motion-compensated results for the five methods.

TABLE 4.7

Comparison of Global-Motion-Compensated MSE Results for Different GME Methods

TABLE 4.8

Comparison of Average MSE and Average Operation Time for Different GME Methods

Note: The operation time for the object segmentation part for P-Seg is taken from Porikli et al. (2010).

Some observations can be drawn from Tables 4.7 and 4.8, and Figure 4.20 as follows:

1. Since the LS-6 method does not differentiate foreground and background, it cannot estimate the global motion of the background precisely. We can see from Table 4.7 that the LS-6 method has larger MSE values. Furthermore, Figure 4.20a also shows that there are obvious background textures in the compensated frame.

2. Compared to the LS-6 method, the other four methods will segment and discard the foreground MBs before estimating the global motion for the background. We can see that our proposed method can achieve performance similar to the MEPG-4 and MSU methods.

3. Since the MEPG-4 algorithm uses a three-layer method to find the outlier (foreground) pixels, its computation complexity is high. Although the MSU and the P-Seg algorithms reduce the complexity by constructing histograms or performing volume growth for estimating the foreground area, they still require several steps of extra computations for estimating the global parameters. Compared with these two methods, our proposed method segments the foreground based on the readily available class information, and the extra computation complexity is obviously minimum. Note that this operation time reduction will become very obvious and important when the GME algorithms are integrated with the video compression module for real-time applications.

4. Although P-Seg can create good object segmentation results, its GME performance is not as good as our method. This is because our proposed algorithm focuses on detecting and filtering the “irregular motion” blocks while P-Seg focuses more on segmenting a complete object. By using our algorithm, blocks that do not belong to the foreground but have irregular motions will also be filtered from the GME process. This further improves the GME performance.

FIGURE 4.20
Subjective global motion-compensated results of the four methods for Dancer_cif. (a) LS-6, (b) MPEG–4, (c) MSU, (d) P-Seg, (e) Proposed.

4.6 Summary

In this chapter, we propose a more accurate MB Importance Measure method by classifying MBs into different classes. A new one-pass CCME is then proposed based on the new measure method. The four computation allocation steps of FLA, CLA, MLA, and SLA in the proposed CCME algorithm are introduced in the chapter. Furthermore, based on the proposed MB classification method, we further propose several algorithms for various video processing applications, including shot change detection, MD detection, and GME. Experimental results demonstrate the effectiveness of our proposed MB classification method as well as the corresponding video processing algorithms.

Acknowledgments

This chapter is supported in part by the following grants: Chinese National 973 grants (2010CB731401 and 2010CB731406), National Science Foundation of China Grants (61001146, 61103124, and 61101147).

References

Akutsu, A., Tonomura, Y., Hashimoto, H., and Ohba, Y. 1992. Video indexing using motion vectors. Proc. SPIE Visual Commun. Image Process., 1818: 1522–1530.

Arman, F., Hsu, A., and Chiu, M. Y. 1994. Image processing on encoded video sequences. Multimedia Syst., 1: 211–219.

Boyce, J. M. 2004. Weighted prediction in the H.264/MPEG AVC video coding standard. IEEE Int. Symp. Circuits Syst., 3: 789–792.

Burleson, W., Jain, P., and Venkatraman, S. 2001. Dynamically parameterized architectures for power-aware video coding: Motion estimation and DCT. IEEE Workshop on Digital and Computational Video, Tampa, FL, pp. 4–12.

Chen, Y.-H., Chen, T.-C., Tsai, C.-Y., Tsai, S.-F., and Chen, L.-G. 2009. Algorithm and architecture design of power-oriented H.264/AVC baseline profile encoder for portable devices. IEEE Trans. Circuits Syst. Video Technol., 19(8): 1118–1128.

Chen, C., Huang, Y., Lee, C., and Chen, L. 2006. One-pass computation-aware motion estimation with adaptive search strategy. IEEE Trans. Multimedia, 8(4): 698–706.

Eom, M. and Choe, Y. 2007. Scene change detection on H.264/AVC compressed video using intra mode distribution histogram based on intra prediction mode, Proceedings of the Conference on Applications of Electrical Engineering, Istanbul, Turkey, pp. 140–144.

He, Z., Liang, Y., Chen, L., Ahmad, I., and Wu, D. 2005. Power-rate-distortion analysis for wireless video communication under energy constraints. IEEE Trans. Circuits Syst. Video Technol., 15(5): 645–658.

HHI, 2011. H.264/AVCreference software, JM 10.2 [Online]. Available at: <http://iphome.hhi.de/suehring/tml/download/old_jm/> [Accessed December 30, 2011].

Huang, Y., Lee, C., Chen, C., and Chen, L. 2005. One-pass computation-aware motion estimation with adaptive search strategy. IEEE Int. Symp. Circuits Syst., 6: 5469–5472.

Kim, C., Xin, J., and Vetro, A. 2006. Hierarchical complexity control of motion estimation for H.264/AVC. SPIE Conference on Visual Communications and Image Processing, San Jose, CA, Vol. 6077, pp. 109–120.

Lee, M.-S. and Lee, S.-W. 2001. Automatic video parsing using shot boundary detection and camera operation analysis. International Conference on Pattern Recognition, Brisbane, Australia, Vol. 2, pp. 1481–1483.

Li, R., Zeng, B., and Liou, M. L. 1994. A new three-step search algorithm for block motion estimation. IEEE Trans. Circuits Syst. Video Technol., 4(4): 438–442.

Lin, W., Baylon, D., Panusopone, K., and Sun, M.-T. 2008. Fast sub-pixel motion estimation and mode decision for H.264. IEEE International Symposium on Circuits and Systems, Seattle, WA, Vol. 6, pp. 3482–3485.

Lin, W., Panusopone, K., Baylon, D., and Sun, M.-T. 2009. A new class-based early termination method for fast motion estimation in video coding. IEEE International Symposium on Circuits and Systems, Taipei, Taiwan, pp. 625–628.

Lin, W., Panusopone, K., Baylon, D., and Sun, M.-T. 2010. A computation control motion estimate method for complexity scalable video coding, IEEE Trans. Circuits Syst. Video Technol., 20(11): 1533–1543.

Lin, W., Sun, M.-T., Li, H., and Hu, H. 2010. A new shot change detection method using information from motion estimation. Pacific-Rim Conference on Multimedia, Shanghai, China, Vol. 6298, pp. 264–275.

Lin, W., Sun, M.-T., Poovendran, R., and Zhang, Z. 2008. Activity recognition using a combination of category components and local models for video surveillance. IEEE Trans. Circuits Syst. Video Technol., 18(8): 1128–1139.

Liu, Y., Wang, W., Gao, W., and Zeng, W. 2004. A novel compressed domain shot segmentation algorithm on H.264/AVC. International Conference on Image Processing, Singapore, pp. 2235–2238.

NIST, 2001. TREC Video Retrieval Evaluation [Online]. Available at: <http://www-nlpir.nist.gov/projects/trecvid/> [Accessed December 30, 2011].

Po, L.-M. and Ma, W. C. 1996. A novel four-step search algorithm for fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol., 6(3): 313–317.

Porikli, F., Bashir, F., and Sun, H. 2010. Compressed domain video object segmentation. IEEE Trans. Circuits Syst. Video Technol., 20(1): 2–14.

Shu, S. and Chau, L. P. 2005. A new scene change feature for video transcoding. IEEE International Symposium on Circuits and Systems, Kobe, Japan, Vol. 5, pp. 4582–4585.

Sikora, T. 1997. The MPEG-4 video standard verification model. IEEE Trans. Circuits Syst. Video Technol., 7(1): 19–31.

Soldatov, S., Strelnikov, K., and Vatolin, D. 2006. Low complexity global motion estimation from block motion vectors. Spring Conference on Computer Graphics, Bratislava, Slovakia, pp. 1–8.

SVMLight, 2011. Light support vectormachine software [Online]. Available at: <http://svmlight.joachims.org/> [Accessed December 30, 2011].

Tai, P., Huang, S., Liu, C., and Wang, J. 2003. Computational aware scheme for software-based block motion estimation. IEEE Trans. Circuits Syst. Video Technol., 13(9): 901–913.

Vanam, R., Riskin, E. A., Hemami, S. S., and Ladner, R. E. 2007. Distortion-complexity optimization of the H.264/MPEG-4 AVC encoder using the GBFOS algorithm. Data Compression Conference, Snowbird, UT, pp. 303–312.

Vanam, R., Riskin, E. A., and Ladner, R. E. 2009. H.264/MPEG-4 AVC encoder parameter selection algorithms for complexity distortion tradeoff. Data Compression Conference, Snowbird, UT, pp. 372–381.

Weigand, T., Schwarz, H., Joch, A., Kossentini, F., and Sullivan, G. 2003. Rate-constrained coder control and comparison of video coding standards. IEEE Trans. Circuits Syst. Video Technol., 13(7): 688–703.

Wiegand, T., Sullivan, G.-J., Bjontegaard, G., and Luthra, A. 2003. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuit Syst. Video Technol., 13(7): 560–576.

Yang, Z., Cai, H., and Li, J. 2005. A framework for fine-granular computational-complexity scalable motion estimation. IEEE International Symposium on Circuits and Systems, Vol. 6, pp. 5473–5476.

Yi, X. and Ling, N. 2005. Scalable complexity-distortion model for fast motion estimation. SPIE Conference on Visual Communications and Image Processing, Beijing, China, Vol. 5960, pp. 1343–1353.

Yi, X., Zhang, J., Ling, N. and Shang, W. 2005. Improved and simplified fast motion estimation for JM. The 16th MPEG/VCEG Joint Video Team Meeting, Poznan, Poland, pp. 1–8.

Zhang, J. and He, Y. 2003. Performance and complexity joint optimization for H.264 video coding. IEEE International Symposium on Circuits and Systems, Vol. 8, pp. 888–891.

Zhang, K. and Kittler, J. 1999. Using scene-change detection and multiple-thread background memory for efficient video coding. Electron. Lett., 35(4): 290–291.

Zhou, Z., Sun, M.-T., and Hsu, Y.-F. 2004. Fast variable block-size motion estimation algorithms based on merge and split procedures for H.264/MPEG-4 AVC. IEEE International Symposium on Circuits and Systems, Vancouver, British Columbia, Canada, pp. 725–728.

Zhu, S. and Ma, K.-K. 2000. A new diamond search algorithm for fast block matching motion estimation. IEEE Trans. Image Process., 9(2): 287–290.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Macroblock Classification Method for Computation Control Video Coding and Other Video Applications Involving Motions

Create new playlist

Sign In

Sign Up

Table of Contents for
4. Macroblock Classification Method for Computation Control Video Coding and Other Video Applications Involving Motions