4.3 Phase Fourier Transform Approach

4.3.1 Introduction to the Phase Fourier Transform

The SR model gives us an elicitation that may not need to completely simulate the structure of the visual system for finding the salient objects in a scene. Existing computational means used in many engineering areas probably also help us to solve the pre-attention issue. However, although the SR model can obtain good results, the reason is not clear, because we are not sure whether the unsmooth parts (spectral residua) in the amplitude spectrum can indeed reflect the innovative part or the salient objects in the scene. Figure 4.5(a) and (b) shows two images with a size of 120 × 120 pixels. They have the same background, but no person (or salient object) exists in one of them, and in the other picture, a person (Polynesian) appears in the foreground. The one-dimensional log amplitude spectra, averaging the frequency components of all pixels at the same distance (number of pixels) from the original point to the pixel, are shown in Figure 4.5(c) and (d).

Figure 4.5 Comparison of two amplitude spectra of the scenes without a person and with a person: (a) background picture (original image); (b) the picture with a person; (c) one-dimensional log amplitude spectrum for 4.5(a), i.e., img vs. frequency (number of pixels from original point); (d) one-dimensional log amplitude spectrum for 4.5(b)

img

Obviously, although the two spectra have minor differences, it is very difficult to distinguish what the novel information is from the two curves (with object and without object).

From the properties of that discrete frequency spectrum, the phase spectrum that recodes position information is probably pivotal for searching for the salient region. In the SR model, the phase spectrum is kept just in its computational process. From this view, a phase Fourier transform (PFT) method is proposed in [5, 6], omitting the spectral residua procedure that makes the computation simpler.

4.3.2 Phase Fourier Transform Approach

According to the analysis above, the PFT approach only needs four steps: (1) resize the input image to the standard image as in the SR model (the number of pixels for the smaller of width (x) or height (y) is 64); (2) perform a discrete Fourier transform (DFT) on the standard image and represent the DFT as amplitude and phase spectra; (3) take all the amplitude spectral components to equal unity, and recover the phase spectrum by an inverse Fourier transform; (4) apply post-processing to the recovered image by using a low-pass Gaussian filter and thus obtain the saliency map. The equations will be introduced as follows. Given an image I with array I (x, y) that is resized to I′(x, y), the resulting saliency map can be calculated by

(4.9) equation

(4.11) equation

(4.14) equation

where img(.) is Fourier transform calculated from Equation 4.1, and ϕ(f) and img are phase and amplitude spectra computed from Equations 4.1 and 4.3. The calculation of amplitude and phase spectra is the same as in SR model, and above we repeat Equations 4.9 and 4.11; Equation 4.14 combines Equations 4.12 and 4.13, but throws away the part of spectral residua in Equation 4.12. Also, g(x, y) is a 2D low-pass Gaussian filter (σ = 8) as in SR model. The value of the saliency map at location (x, y) is obtained by Equation 4.14. Finally, the array SM(x,y) forms the saliency map.

4.3.3 Results and Discussion

In order to observe the impact on the saliency detection results while leaving out spectral residua, the same database that contains 62 natural images with resolution around 800 × 600 pixels was chosen as a test set for PFT in [5]. In this testing, all the images were cut out as square. Computational results of the database showed that the resultant saliency maps for both PFT and SR models are almost the same, but PFT saved about 1/3 of the computational cost [5]. Two selected saliency maps for both models are shown in Figure 4.6.

Figure 4.6 Comparison of the PFT and SR models: (a) original images (64 × 64 pixels); (b) saliency maps of the original images from the PFT model; (c) saliency maps of the original images from the SR model [5]. © 2008 IEEE. Reprinted, with permission, from C. Guo, Q. Ma, L. Zhang, ‘Spatio-temporal Saliency detection using phase spectrum of quaternion Fourier transform’, IEEE Conference on Computer Vision and Pattern Recognition, June 2008

img

In order to make a quantitative comparison, let img and img represent the saliency maps of image i for PFT and SR models respectively, and the maximum value of each saliency map is normalized to unity. The maximum pixel difference (MPDi) of two saliency map of image i is introduced as

(4.15) equation

The minimum pixel difference and average pixel difference over all pixels between two saliency maps for image i are similar to Equation 4.15, which we do not list again here. Table 4.1 shows the maximum, minimum and average MPD of the entire database in four different resolutions [5]. From Table 4.1, we can see that the difference between PFT and SR is slight (less than 5% per pixel). Since such a small difference is unable to change the locations of salient regions, PFT can supersede SR in many applications. The PFT model is simpler than the SR model, and as with SR, it only requires a few sentences of MATLAB® to complete.

Table 4.1 The MPD of the saliency maps from PFT and SR in different resolution [5]. © 2008 IEEE. Reprinted, with permission, from C. Guo, Q. Ma, L. Zhang, ‘Spatio-temporal Saliency detection using phase spectrum of quaternion Fourier transform’, IEEE Conference on Computer Vision and Pattern Recognition, June 2008.

img

The reason PFT can pop out the salient objects can be considered as follows. First, PFT keeps the phase spectrum, which means holding all the local information in an image. Because various high frequency components mainly concentrate at objects' edges in the scene, PFT can pop out these high frequency regions related to objects. Second, since the amplitude spectra in natural or man-made object images decreases when the frequency increases (Equations 4.4 and 4.6), normalizing the amplitude spectra means suppression of low-frequency components and enhancement of high-frequency in the scene. More precisely, those high-frequency components just include the objects' information. In the SR model, since the low-frequency components are also suppressed by subtraction between the original and smooth amplitude spectra, the SR model has the ability to detect objects. Third, repeating texture with certain spatial frequency in scene, such as water waves in a river, wallpaper or ceramic tiles with repeating stripe pattern, is often uninteresting or redundant information, which always results in some peaks in the amplitude spectrum at the spatial frequency. PFT can inhibit the redundant information with the aid of normalizing the amplitude spectrum. Contrarily, the SR model cannot attenuate such information, since the peak may be considered as the spectral residual or novel information (see the experiment in Section 4.4.4). It is worth stating that noise in an image is reduced by post-processing with a low-pass Gaussian filter. The further reason about biological plausibility will be discussed in Section 4.6.

Mathematically, the SR and PFT approaches are similar [3, 5] Let us set the order of smooth filter (Equation 4.7 of the SR model) to one, n = 1, and thus we have

(4.16) equation

Substituting Equation 4.16 for Equation 4.12, we can obtain the same formula as Equation 4.14.

Consequently, perhaps it is more appropriate to think that both SR and PFT belong to the phase spectrum methodology.

SR and PFT are very fast in terms of their computational speed, and they do not depend on many parameters: in SR, only the order of the smooth filter and the variance of the Gaussian low-pass filter in the post-processing stage need to be selected, and in PFT, only the variance of the low-pass Gaussian filter has to be set. However, it can be shown from the analysis in [6] that these parameters (n and σ) are not sensitive to the final results. Contrarily, in spatial domain models, their remarkable performances usually rely on the choice and estimation of various parameters. Of course, the SR and PFT approaches have their limitations; some modifications of the phase spectrum method were proposed in 2010, such as modifying the smooth filter method [26] and the blurring image approach [27], but we will not be developing them here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset