6.3 Eye-tracking Data

The third type of ground-truth data, which can be used in evaluating visual attention models, is the eye-tracking data [5, 16]. An eye-tracker is a device for automatically measuring subjects' eye positions, movements and the associated durations. Thus, tracking of the eye fixations of observers to the images provides the ground-truth to evaluate the performance of visual attention models. This is achieved by comparing the yielded saliency map with the human eye fixation map generated by eye-tracking. The use of eye-tracking overcomes the drawback of human-labelling mentioned in Section 6.2.

One database of human eye fixation is given in [5], which includes 120 images and their human eye fixation data obtained from 20 subjects. The human eye fixation database in [5] has been acquired as follows. Images are presented to a subject in a random order for several seconds. Subjects were positioned 0.75 m from a 21-inch CRT monitor and given no particular instructions. The eye-tracking apparatus used was a standard non-head-mounted device, and the subjects looked at the images in a natural manner. The eye-tracker recorded the fixation points from subjects for the images. In this database, a raw fixation map is produced for each image, based on all the fixation points and subjects.

Post-processing can be performed to derive a continuous fixation density map from the raw fixation map. As we know, when a subject looks at an image, the image is projected onto the retina of the human visual system (HVS), with a fixation point in the image aligned with the fovea (the centre of the retina). The image is sampled by photoreceptors on the retina, and the photoreceptor density drops steeply moving peripherally from the fovea. This drop-off effect may be modelled based on a 2D Gaussian distribution with appropriate parameters, and centred on the measured fixation point. A continuous fixation density map is therefore derived, based on the accumulation of all 2D Gaussians corresponding to each fixation point.

The resultant fixation density map indicates the extent to which each pixel of the image is sampled on average by human observers. Some image samples and their fixation density maps are shown in Figure 6.3. Figure 6.3(a) includes three images in the database mentioned, while Figure 6.3(b) shows the fixation density maps and the saliency maps generated from the model in [5], respectively.

Figure 6.3 Images, fixation maps based on eye-tracking, and saliency maps [5]. Reprinted with permission from Bruce, N.D.B., Tsotsos, J.K., Saliency Based on Information Maximization. Advances in Neural Information Processing Systems, 18, pp. 155–162, June 2006. Neural Information Processing Systems Foundation.

img

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset