4.7 Amplitude Spectrum of Quaternion Fourier Transform (AQFT) Approach

Previous sections have introduced the bottom-up computational models of visual attention built based on the phase spectrum while setting the amplitude spectrum constant and keeping the phase spectrum. Actually, the amplitude spectrum of image patches in the whole image can also be used to build the computational model of visual attention for its representation of feature distributions for the image. In that case, the information of object locations is denoted by the saliency of image patches directly obtained in their frequency domain. This section introduces a computational model of visual attention built based on the amplitude spectrum of QFT [9]. In this model, the input image is firstly divided into small patches. Then the quaternion representation based on three features and QFT are obtained for each image patch. The amplitude spectrum of QFT of image patches is adopted to represent the colour, intensity and orientation distributions for these patches. The saliency value for each image patch is calculated not only by the differences between the QFT amplitude spectrum of this patch and other patches in the whole image, but also by the visual impacts for these differences determined by human visual sensitivity.

4.7.1 Saliency Value for Each Image Patch

In the proposed AQFT model [9], the saliency value of each image patch is determined by two factors: the patch's amplitude spectrum differences between an image patch and all other image patches in the input image; and the weighting for these patch differences. If these differences between an image patch and all other image patches are big then the saliency value for this image patch is large. In addition, the influence of the foveation behaviour is taken into consideration in the model. Here, img represents the difference of amplitude spectrum between image patch i and image patch j, and the saliency value for image patch i can be expressed as:

(4.62) equation

where img is the weight for the patch difference between image patches i and j, which is determined by human visual sensitivity.

It is generally believed that the HVS is highly space-variant because the retina in the human eye has a different density of cone photoreceptor cells [64]. On the retina, the fovea has the highest density of cone photoreceptor cells. Thus, the focused area in a scene has to be projected onto the fovea to be perceived at the highest resolution. The density of the cone photoreceptor cells becomes lower with larger retinal eccentricity. Therefore, the visual sensitivity decreases with the increased eccentricity from the fixation point, as shown in Figure 4.17 [9, 64, 65].

Figure 4.17 The relationship between visual acuity and eccentricity [65]. © 2012 IEEE. Reprinted, with permission, from Y. Fang, W. Lin, B. Lee, C. Lau, Z. Chen, C. Lin, ‘Bottom-Up Saliency Detection Model Based on Human Visual Sensitivity and Amplitude Spectrum’, IEEE Transactions on Multimedia, Feb. 2012

img

As to the saliency value of image patch i in Equation 4.62, all patch differences between the image patch i and the other image patches are considered and summed together. The human visual sensitivity is adopted to determine the weights for the patch differences. In the AQFT model, the eccentricity from the centre of the fixation (the centre of image patch i) is not directly used for calculating the saliency value of image patch i but instead a weighting factor for calculating the importance of patch-difference pairs is used. The weights for the patch differences are determined by human visual sensitivity, and this means that the weights of the patch differences from its nearer neighbour patches (with smaller eccentricities) are larger than those from farther neighbour patches. With larger eccentricity of image patches from image patch i (which means farther image patches from the image patch i), the visual sensitivity decreases and thus the weighting for the patch differences between these image patches and image patch i become smaller. Therefore, the contributions of the patch differences to the saliency value of image patch i will decrease with larger-eccentricity image patches than image patch i. On the contrary, the contributions of the patch differences to the saliency value of image patch i will increase with smaller-eccentricity image patches than image patch i. This is reasonable, as human eyes are more sensitive to the patch differences from nearer image patches compared to those from farther image patches. This AQFT model takes both local and global centre–surround differences into account, for it uses the patch differences from all other image patches in the image to calculate the saliency value of image patch i, so that the centre–surround differences from both near and far neighbours are considered.

4.7.2 The Amplitude Spectrum for Each Image Patch

The AQFT model uses the colour and intensity channels for QFT to get the amplitude spectrum for each image patch, which is used to compute the differences between image patches. The amplitude spectrum of QFT represents colour, intensity and orientation distributions for image patches when the colour and intensity channels are used as the input to QFT. The differences between amplitude spectra of QFT for image patches can show the differences for colour, intensity and orientation distributions between image patches. In this model, the opponent colour space is used to represent the colour information for image patches. In a similar method of colour representation as used before, if r, g and b denote the red, green and blue colour components, four broadly tuned colour channels are generated as img for red, img for green, img for blue, and img for yellow. Each colour channel is then decomposed into red–green and blue–yellow double opponency according to the related property of the human primary visual cortex [66]:

(4.63) equation

(4.64) equation

The intensity channel can be computed as img. The three features of I, Crg and Cby are used for calculating the amplitude spectrum of the QFT. Based on the three features, the quaternion representation for each image patch is:

(4.65) equation

where img, img and img are the imaginary axes of a pure quaternion; img; img, img, img and img. It is notice that the motion feature in Equation 4.32 is set to 0 for the still images.

The symplectic decomposition as mentioned in Sections 4.4.2 and 4.4.3 for the above quaternion image patch is given by:

(4.66) equation

(4.67) equation

(4.68) equation

The study of [37] indicates that the QFT can be calculated by using two standard complex FFTs. The QFT of img in Equation 4.66 can be computed as:

(4.69) equation

(4.70) equation

where img; (n, m) and (u, v) are the locations for image patches in spatial and frequency domains respectively; N and M are the height and width of image patches; img is obtained from Equations 4.67 and 4.68. It is noticed that above computation is based on the image patch, not on the whole image as with PQFT mentioned in Section 4.4, though Equations 4.69 and 4.70 are repeated.

img in Equation 4.69 can be represented in polar form as follows:

(4.71) equation

where A is the QFT amplitude spectrum of the image patch at frequency (u, v); img is the corresponding QFT phase spectrum; img is a unit pure quaternion.

Actually, the QFT amplitude A can be calculated as

(4.72) equation

Based on Equation 4.72, the amplitude spectrum of QFT can be calculated for each image patch, to be used to represent each image patch.

4.7.3 Differences between Image Patches and their Weighting to Saliency Value

The saliency value of each image patch is determined by the weighted differences between the patch and its patch neighbours, including all other image patches in the image. If an image patch is significantly different from its neighbours, it has a higher probability of being a salient region. The saliency value for an image patch should be larger with larger differences between the patch and its neighbours. As the spatial distance (eccentricity) between the patch and its neighbour increases, the weight of this difference to the saliency value of the patch decreases. The Euclidian distance of the amplitude spectrum of QFT is adopted to represent the differences between each patch and its neighbours. To reduce the dynamic range of the amplitude coefficients, the AQFT model uses a logarithm operation and adds the constant 1 to each original amplitude coefficient value to avoid the undefined case when A approaches zero. Using this algorithm, the difference between image patches i and j can be computed as

(4.73) equation

where m indexes all pixels in an image patch after QFT.

The visual sensitivity is used to determine the weights of the QFT amplitude spectrum differences between image patches. In the AQFT model, the algorithm developed in [9] is adopted to measure the human contrast sensitivity as a function of eccentricity. The contrast sensitivity img is defined as the reciprocal of the contrast threshold img as follows:

(4.74) equation

According to the study in [9], the contrast threshold is defined as

(4.75) equation

where f is the spatial frequency (cycles/degree), e is the retinal eccentricity (degree); img is the minimum contrast threshold; img is the spatial frequency decay constant; img is the half-resolution eccentricity. According to the experiments reported in [9], these parameters are set to img, img, and img

The retina eccentricity e can be calculated according to its relationship with viewing distance v as Figure 4.18. Given the position of the fixation point img (the centre of an image patch), the retinal eccentricity e for the position img (the centre of another image patch) can be computed as follows:

Figure 4.18 The relationship between viewing distance and retina eccentricity. © 2012 IEEE. Reprinted, with permission, from Y. Fang, W. Lin, B. Lee, C. Lau, Z. Chen, C. Lin, ‘Bottom-Up Saliency Detection Model Based on Human Visual Sensitivity and Amplitude Spectrum’, IEEE Transactions on Multimedia, Feb. 2012

img

(4.76) equation

where d is the Euclidian distance between img and img. The typical ratio of the viewing distance to the picture height is in the range of 3 to 6 [67]. A ratio of 4 is used to determine the viewing distance. Thus, the weight img can be calculated as the normalized img based on Equations 4.744.76. The weighting parameters img in Equation 4.62 can be calculated as follows:

(4.77) equation

From the description above, the saliency value for the image patch i is represented as all the contributions from the patch differences between the image patch i and all other image patches in the image, as calculated in Equation 4.62.

4.7.4 Patch Size and Scale for Final Saliency Value

The final saliency map is influenced by the image patch size. The existing computational visual attention models always choose a fixed patch size empirically. In the AQFT model, the characteristics of the HVS and the fovea size are considered to determine the patch size. Given an image patch with the size p × p, the relationship between the eccentricity e and the viewing distance v can be computed as follows:

(4.78) equation

Studies show that the 1–2 degree retinal area in the fovea has the best visual acuity and the parafovea surrounding the fovea has lower visual acuity [68]. According to Equation 4.78, the size of image patch, p, can be estimated for a given viewing distance v and best eccentricity. Here img is used to represent the eccentricity for the best visual acuity, which is set as 1 degree; e is set as img, where img to make sure that with e good visual acuity is maintained. The view distance is set to four times the image height, while β is set to 0.2. Setting β = 0.2 means that the maximum eccentricity for the width of the image patch is 0.2̊, and this guarantees that the whole image patch is within the area with the best visual acuity. In addition, for better effect, the input images are divided into partially overlapping image patches, to be determined by the overlap-eccentricity img. We choose the parameter img.

The patch size would influence the final saliency map. With a smaller patch size, the final saliency map will become more distinguishable. Of course, to obtain a more accurate saliency map, the images can be divided into smaller image patches with larger overlapping, but this will increase the computational complexity. Given an input image with size of W × H (where W is the width and H is the height): with the patch size of img, the computational complexity of the proposed algorithm is img with img overlapping. Obviously, with the smaller patch size or more overlapping, the computational complexity will increase. Thus, a suitable patch size is chosen to compute the saliency map based on the consideration of fovea characteristics, saliency detection performance and computational complexity.

Except for the patch size, the image scale will also influence the final saliency map. In the saliency map, the saliency values for image patches with large dissimilarity are much higher than that of those patches belonging to background. For images with different scales, the saliency values of the background are low, while the saliency values of significant foreground regions are high. Thus, using multiple scales can strengthen the saliency for these attentional regions. The steerable pyramid algorithm [69] is adopted to obtain multiscale images, through low-pass filtering and subsampling the input image. For simplicity, the linear combination is used to obtain the final saliency map as follows:

(4.79) equation

where P is the scale number; img is the saliency value for image patch i in the lth scale. The image with the lowest scale level should not be too small for the final saliency map. This model uses three different scales to get the final saliency map: the original scale, a half of the original scale and one quarter of the original scale.

In sum, the AQFT model is built based on both local and global feature contrast, the human visual sensitivity and the QFT amplitude spectrum. This model first divides the the input images into small image patches. It then uses the QFT amplitude spectrum to represent the colour, intensity and orientation distributions for image patches. The saliency value for each image patch is obtained by computing the differences between the QFT amplitude spectrum of this patch and all other patches in the image, and the weights for these differences are determined by the visual impacts of the human visual sensitivity. The saliency detection model also utilizes the characteristics of the HVS for the selection of patch size and multiscale operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset