Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach

Previous sections have introduced the bottom-up computational models of visual attention built based on the phase spectrum while setting the amplitude spectrum constant and keeping the phase spectrum. Actually, the amplitude spectrum of image patches in the whole image can also be used to build the computational model of visual attention for its representation of feature distributions for the image. In that case, the information of object locations is denoted by the saliency of image patches directly obtained in their frequency domain. This section introduces a computational model of visual attention built based on the amplitude spectrum of QFT [9]. In this model, the input image is firstly divided into small patches. Then the quaternion representation based on three features and QFT are obtained for each image patch. The amplitude spectrum of QFT of image patches is adopted to represent the colour, intensity and orientation distributions for these patches. The saliency value for each image patch is calculated not only by the differences between the QFT amplitude spectrum of this patch and other patches in the whole image, but also by the visual impacts for these differences determined by human visual sensitivity.

4.7.1 Saliency Value for Each Image Patch

In the proposed AQFT model [9], the saliency value of each image patch is determined by two factors: the patch's amplitude spectrum differences between an image patch and all other image patches in the input image; and the weighting for these patch differences. If these differences between an image patch and all other image patches are big then the saliency value for this image patch is large. In addition, the influence of the foveation behaviour is taken into consideration in the model. Here, represents the difference of amplitude spectrum between image patch i and image patch j, and the saliency value for image patch i can be expressed as:

(4.62)

where is the weight for the patch difference between image patches i and j, which is determined by human visual sensitivity.

It is generally believed that the HVS is highly space-variant because the retina in the human eye has a different density of cone photoreceptor cells [64]. On the retina, the fovea has the highest density of cone photoreceptor cells. Thus, the focused area in a scene has to be projected onto the fovea to be perceived at the highest resolution. The density of the cone photoreceptor cells becomes lower with larger retinal eccentricity. Therefore, the visual sensitivity decreases with the increased eccentricity from the fixation point, as shown in Figure 4.17 [9, 64, 65].

Figure 4.17 The relationship between visual acuity and eccentricity [65]. © 2012 IEEE. Reprinted, with permission, from Y. Fang, W. Lin, B. Lee, C. Lau, Z. Chen, C. Lin, ‘Bottom-Up Saliency Detection Model Based on Human Visual Sensitivity and Amplitude Spectrum’, IEEE Transactions on Multimedia, Feb. 2012

As to the saliency value of image patch i in Equation 4.62, all patch differences between the image patch i and the other image patches are considered and summed together. The human visual sensitivity is adopted to determine the weights for the patch differences. In the AQFT model, the eccentricity from the centre of the fixation (the centre of image patch i) is not directly used for calculating the saliency value of image patch i but instead a weighting factor for calculating the importance of patch-difference pairs is used. The weights for the patch differences are determined by human visual sensitivity, and this means that the weights of the patch differences from its nearer neighbour patches (with smaller eccentricities) are larger than those from farther neighbour patches. With larger eccentricity of image patches from image patch i (which means farther image patches from the image patch i), the visual sensitivity decreases and thus the weighting for the patch differences between these image patches and image patch i become smaller. Therefore, the contributions of the patch differences to the saliency value of image patch i will decrease with larger-eccentricity image patches than image patch i. On the contrary, the contributions of the patch differences to the saliency value of image patch i will increase with smaller-eccentricity image patches than image patch i. This is reasonable, as human eyes are more sensitive to the patch differences from nearer image patches compared to those from farther image patches. This AQFT model takes both local and global centre–surround differences into account, for it uses the patch differences from all other image patches in the image to calculate the saliency value of image patch i, so that the centre–surround differences from both near and far neighbours are considered.

4.7.2 The Amplitude Spectrum for Each Image Patch

The AQFT model uses the colour and intensity channels for QFT to get the amplitude spectrum for each image patch, which is used to compute the differences between image patches. The amplitude spectrum of QFT represents colour, intensity and orientation distributions for image patches when the colour and intensity channels are used as the input to QFT. The differences between amplitude spectra of QFT for image patches can show the differences for colour, intensity and orientation distributions between image patches. In this model, the opponent colour space is used to represent the colour information for image patches. In a similar method of colour representation as used before, if r, g and b denote the red, green and blue colour components, four broadly tuned colour channels are generated as for red, for green, for blue, and for yellow. Each colour channel is then decomposed into red–green and blue–yellow double opponency according to the related property of the human primary visual cortex [66]:

(4.63)

(4.64)

The intensity channel can be computed as . The three features of I, C_rg and C_by are used for calculating the amplitude spectrum of the QFT. Based on the three features, the quaternion representation for each image patch is:

(4.65)

where , and are the imaginary axes of a pure quaternion; ; , , and . It is notice that the motion feature in Equation 4.32 is set to 0 for the still images.

The symplectic decomposition as mentioned in Sections 4.4.2 and 4.4.3 for the above quaternion image patch is given by:

(4.66)

(4.67)

(4.68)

The study of [37] indicates that the QFT can be calculated by using two standard complex FFTs. The QFT of in Equation 4.66 can be computed as:

(4.69)

(4.70)

where ; (n, m) and (u, v) are the locations for image patches in spatial and frequency domains respectively; N and M are the height and width of image patches; is obtained from Equations 4.67 and 4.68. It is noticed that above computation is based on the image patch, not on the whole image as with PQFT mentioned in Section 4.4, though Equations 4.69 and 4.70 are repeated.

in Equation 4.69 can be represented in polar form as follows:

(4.71)

where A is the QFT amplitude spectrum of the image patch at frequency (u, v); is the corresponding QFT phase spectrum; is a unit pure quaternion.

Actually, the QFT amplitude A can be calculated as

(4.72)

Based on Equation 4.72, the amplitude spectrum of QFT can be calculated for each image patch, to be used to represent each image patch.

4.7.3 Differences between Image Patches and their Weighting to Saliency Value

The saliency value of each image patch is determined by the weighted differences between the patch and its patch neighbours, including all other image patches in the image. If an image patch is significantly different from its neighbours, it has a higher probability of being a salient region. The saliency value for an image patch should be larger with larger differences between the patch and its neighbours. As the spatial distance (eccentricity) between the patch and its neighbour increases, the weight of this difference to the saliency value of the patch decreases. The Euclidian distance of the amplitude spectrum of QFT is adopted to represent the differences between each patch and its neighbours. To reduce the dynamic range of the amplitude coefficients, the AQFT model uses a logarithm operation and adds the constant 1 to each original amplitude coefficient value to avoid the undefined case when A approaches zero. Using this algorithm, the difference between image patches i and j can be computed as

(4.73)

where m indexes all pixels in an image patch after QFT.

The visual sensitivity is used to determine the weights of the QFT amplitude spectrum differences between image patches. In the AQFT model, the algorithm developed in [9] is adopted to measure the human contrast sensitivity as a function of eccentricity. The contrast sensitivity is defined as the reciprocal of the contrast threshold as follows:

(4.74)

According to the study in [9], the contrast threshold is defined as

(4.75)

where f is the spatial frequency (cycles/degree), e is the retinal eccentricity (degree); is the minimum contrast threshold; is the spatial frequency decay constant; is the half-resolution eccentricity. According to the experiments reported in [9], these parameters are set to , , and

The retina eccentricity e can be calculated according to its relationship with viewing distance v as Figure 4.18. Given the position of the fixation point (the centre of an image patch), the retinal eccentricity e for the position (the centre of another image patch) can be computed as follows:

Figure 4.18 The relationship between viewing distance and retina eccentricity. © 2012 IEEE. Reprinted, with permission, from Y. Fang, W. Lin, B. Lee, C. Lau, Z. Chen, C. Lin, ‘Bottom-Up Saliency Detection Model Based on Human Visual Sensitivity and Amplitude Spectrum’, IEEE Transactions on Multimedia, Feb. 2012

(4.76)

where d is the Euclidian distance between and . The typical ratio of the viewing distance to the picture height is in the range of 3 to 6 [67]. A ratio of 4 is used to determine the viewing distance. Thus, the weight can be calculated as the normalized based on Equations 4.74–4.76. The weighting parameters in Equation 4.62 can be calculated as follows:

(4.77)

From the description above, the saliency value for the image patch i is represented as all the contributions from the patch differences between the image patch i and all other image patches in the image, as calculated in Equation 4.62.

4.7.4 Patch Size and Scale for Final Saliency Value

The final saliency map is influenced by the image patch size. The existing computational visual attention models always choose a fixed patch size empirically. In the AQFT model, the characteristics of the HVS and the fovea size are considered to determine the patch size. Given an image patch with the size p × p, the relationship between the eccentricity e and the viewing distance v can be computed as follows:

(4.78)

Studies show that the 1–2 degree retinal area in the fovea has the best visual acuity and the parafovea surrounding the fovea has lower visual acuity [68]. According to Equation 4.78, the size of image patch, p, can be estimated for a given viewing distance v and best eccentricity. Here is used to represent the eccentricity for the best visual acuity, which is set as 1 degree; e is set as , where to make sure that with e good visual acuity is maintained. The view distance is set to four times the image height, while β is set to 0.2. Setting β = 0.2 means that the maximum eccentricity for the width of the image patch is 0.2̊, and this guarantees that the whole image patch is within the area with the best visual acuity. In addition, for better effect, the input images are divided into partially overlapping image patches, to be determined by the overlap-eccentricity . We choose the parameter .

The patch size would influence the final saliency map. With a smaller patch size, the final saliency map will become more distinguishable. Of course, to obtain a more accurate saliency map, the images can be divided into smaller image patches with larger overlapping, but this will increase the computational complexity. Given an input image with size of W × H (where W is the width and H is the height): with the patch size of , the computational complexity of the proposed algorithm is with overlapping. Obviously, with the smaller patch size or more overlapping, the computational complexity will increase. Thus, a suitable patch size is chosen to compute the saliency map based on the consideration of fovea characteristics, saliency detection performance and computational complexity.

Except for the patch size, the image scale will also influence the final saliency map. In the saliency map, the saliency values for image patches with large dissimilarity are much higher than that of those patches belonging to background. For images with different scales, the saliency values of the background are low, while the saliency values of significant foreground regions are high. Thus, using multiple scales can strengthen the saliency for these attentional regions. The steerable pyramid algorithm [69] is adopted to obtain multiscale images, through low-pass filtering and subsampling the input image. For simplicity, the linear combination is used to obtain the final saliency map as follows:

(4.79)

where P is the scale number; is the saliency value for image patch i in the lth scale. The image with the lowest scale level should not be too small for the final saliency map. This model uses three different scales to get the final saliency map: the original scale, a half of the original scale and one quarter of the original scale.

In sum, the AQFT model is built based on both local and global feature contrast, the human visual sensitivity and the QFT amplitude spectrum. This model first divides the the input images into small image patches. It then uses the QFT amplitude spectrum to represent the colour, intensity and orientation distributions for image patches. The saliency value for each image patch is obtained by computing the differences between the QFT amplitude spectrum of this patch and all other patches in the image, and the weights for these differences are determined by the visual impacts of the human visual sensitivity. The saliency detection model also utilizes the characteristics of the HVS for the selection of patch size and multiscale operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach

Create new playlist

Sign In

Sign Up

4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach

4.7.1 Saliency Value for Each Image Patch

4.7.2 The Amplitude Spectrum for Each Image Patch

4.7.3 Differences between Image Patches and their Weighting to Saliency Value

4.7.4 Patch Size and Scale for Final Saliency Value

Table of Contents for
4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach