6.6 Spearman's Rank Order Correlation with Visual Conspicuity

As mentioned in Sections 6.4–6.5, most saliency measurements need to compare the computational saliency map with the ground-truth images or videos obtained from human eye fixations. However, the eye fixations may include both bottom-up information and top-down knowledge that is different for each individual. It seems a bit unfair to test pure bottom-up attention models without prior knowledge. There is no more objective standard for benchmarking the visual conspicuity in a complex scene with or without a target. Experiments showed that search time to a target in a complex natural environment may be related to the conspicuity area measured under conditions of minimal uncertainty [31]. The conspicuity area of a target is defined as the region around the centre of the visual field where the target is capable of attracting visual attention [32]. A target (e.g., red car) among many distractors (e.g., black cars) can be easily detected, even if the eye fixation is located at the target's periphery. Conversely, for the target (e.g., red car) among many red distractors (e.g., red cars), it will fail to attract visual attention. As can be easily understood, the larger the conspicuity area, the faster the search speed. This idea is somewhat different from feature integration theory [11]. The conspicuity area is based on the variations in simple discrimination performance across different stimulus conditions [32], and feature integration theory is based on low-level features [11]. For example, the search time increases with the number of distractors in conjunction cases, which is due to the requirement of integration of different low-level features in [11], and in [32] and [33] distractors increasing means more clutter is close to the target, resulting in conspicuity area reduction. The search time is inversely proportional to the conspicuity area of the target.

How do we measure the conspicuity area of a target embedded in its surrounding scene? The TNO human factors research laboratory developed a psychophysical procedure to quantify the visual conspicuity of a single military target (vehicle) in a complex natural scene by several subjects with the help of optical instruments [32, 33]. Several observers were asked to find a target embedded in a complex surrounding of a real-world situation or in any laboratory. All observers needed to measure how far they could move their gaze away from the target until it cannot be detected or recognized. Then the observer successively moved their gaze towards the target position until they could perceive the target. The angular distance between the fixation locations which the target was first perceived at and the centre of the target was recorded. The angular distance of gaze deviation is regarded as the human visual conspicuity for the target in the circumstance, which tests the degree of the target being able to stand out from its immediate surroundings [32–34]. Toet et al. proposed two types of conspicuity estimations: one is detection conspicuity that the target can be noticeable from the background and the other is identification conspicuity which reflects that the target can be recognized in the background. The angular distances for the two types are different. The former is likely to mean bottom-up saliency and the latter maybe includes top-down saliency component [34].

According to the above idea, a ground-truth of a data set with a single military target in a complex natural background was created by the TNO human factors research laboratory [35], which provided an image data set and the related human visual conspicuity measurements such as human detection and identification conspicuity, as well as the mean search time (averaged over 64 observers).

Recently a different method for validating computational attention models was proposed based on the ground-truth set [36], in which the binary mask of each image in the data set is available. In the binary mask, pixel value ‘1’ represents the location of the visible parts of the target and ‘0’ denotes other parts in the image. The rank of all the human visual conspicuity criteria in the data set is calculated as in [36] in order to compare the performance of computational models.

For a tested computational model, the average saliency and maximum saliency over the target area are computed for each image with the aid of binary target masks, and the ranks of average saliency and maximum saliency in the data set is also arranged in order of their values [36]. The comparison between human visual conspicuity (detection conspicuity and identification conspicuity) and the computational model is based on the Spearman's rank-order correlation, a non-parametric version of the Pearson product-moment correlation. Spearman's correlation coefficient measures the strength of association that exists between two ranked variables if there is a monotonic relationship between the two ranked variables. The Spearman's rank-order correlation (ρ) and its statistical significance can be calculated by some software called the statistical product and service solutions (SPSS), which is provided by many websites such as http://www.ibm.com/software/analytics/spssproducts/modeler/. Larger correlation coefficient may reflect higher performance, since it is more related to human visual conspicuity. It is noticed that detection conspicuity likely reflects bottom-up component, and its rank order correlation possibly corresponds to the pure bottom-up computational models.

It is a pity that the data set provided by the TNO human factors research laboratory is only considered a single target in each complex natural scene. In more cases, there are multiple targets in a natural scene. Thereby the ROC curve, AUC score and KL score are the mainly quantitative metrics used to compare different visual attention models (e.g., models mentioned in Chapters 03–05 [2, 4, 20, 21, 27, 37–41]).

Other quantitative measures such as the average number of hitting objects detected per frame [4], the computational time of saliency map [4, 40], the curve of the precision vs. recall [10, 18, 19], the F-measure [19, 42] and so on, as mentioned above, can also be adopted for different purposes.

As was emphasized at the start of this chapter, the quantitative measures for a new computational model should consider multiple testing criteria on various ground-truth databases. A good performance of computational model for certain ground-truth databases or based on some criteria may not work well for other cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset