1.1 The Concept of Visual Attention

1.1.1 Selective Visual Attention

In about 300 bc, Aristotle, a famous philosopher and scientist, mentioned the concept of visual selective attention and stated that humans cannot perceive two objects by a sensory act simultaneously [1]. Although people generally believe that they can get much information from their rich and colourful world and become conscious of environmental changes, many experiments have shown that the human visual ability is overvalued. When people as observers look at a scene, they have the feeling of being able to see all the details. However, when a blank field is inserted between two successive nature scenes with some differences, most observers fail to spot the changes. The reason for this phenomenon is that observers' eyes only can focus their attention on a small area of the visual field at a given moment. Consequently, only this small area can be observed in detail. The saccade over the surroundings often means that our eyes are located in a few places longer than others, or more often than others. The eyes will jump from one fixated locations to another in a scene by saccade. Some animals like quadrumanes also have this ability of selective visual attention. The areas that eyes of humans and quadrumanes often gaze at are referred to as fixated regions and the ignored regions as non-fixated regions.

What is selective attention? A definition given by Corbetta [2] is ‘The mental ability to select stimuli, responses, memory or thought that are behaviourally relevant among the many others that behavioural irrelevant’. Simply, selective visual attention is one of many properties of human and animal vision allowing them to extract important information from abundant visual inputs.

The phenomena of selective visual attention exist everywhere. We demonstrate some intuitive real life examples shown in Figure 1.1 (the black and white versions of some colour images): imagine that you visit an extraordinary place for the first time such as Dream World in Gold Coast, Australia, or the Louvre in Paris, France. In the Dream World shown as Figure 1.1(a), your eyes will first of all involuntarily gaze at a few persons wearing fancy dress and acting as rabbits with long ears, and then they will shift to other dramatis personae of fairy tales near the ‘rabbits’ and continue to other dancing girls on the street, as marked with white circles in Figure 1.1(a). In the Louvre, you will pay attention to the exquisite sculptures moving from one to another as you pass each showroom (Figure 1.1(b)), but you do not need to visit these special sites to experience selective visual attention, because selective visual attention is a concomitant of daily life. If a black spider crawls on a white ceiling just above your bed while you are lying down, you will notice it right away. You can firstly pay attention to red flowers among green leaves and grass (Figure 1.1(c) is a black and white version), or you can immediately stop walking while a car sweeps in front of you rapidly and so on, since outstanding colour targets (such as red flowers) and moving objects (such as a car) attract your attention. When you are enjoying captivating scenery or an artwork, you do not notice your friend or other things in the area around you.

Figure 1.1 Examples of selective visual attention

img

Fixation regions also depend on subjective consciousness. Under the cue or guidance, selective visual attention becomes more intentional; for example, the intention of identifying your old classmate at the airport from the passenger crowd drives your eyes to shift only to the faces of passengers and to search for a familiar face in your memory, regardless of other factors such as colourful dresses or fancy hairstyles which may draw more attention in free cases (i.e., without guidance of tasks).

From above examples we can summarize the following facts.

1. People cannot apperceive two targets simultaneously and only pay their attention to a small area of the visual field at a given moment.
2. It is incontestable that selective visual attention is in existence universally.
3. In a given complex scene, our eyes can search out the locations of different objects according to information significance or prominence by jumping (from the rabbits to girls and others), and finally can see most objects in the scene if the scene is stationary. However, for a dynamic scene (like video or animation), our eyes only grasp a few salient object areas and discard many details that are simply not seen.
4. Under the guidance of specific motivation, selective visual attention areas are different from free cases.

1.1.2 What Areas in a Scene Can Attract Human Attention?

Firstly, let us consider the case without the cues or guidance of prior knowledge, such as a baby with normal vision that looks at the natural world. Early research [3–5] showed that features such as intensity, colour, edge orientation and motion play important roles in visual selective behaviour. The locations with high intensity contrast, fresh colour, objects edges and motion always lead to more attention than other locations in scenes. It is easy to observe this in infants. When a baby opens its eyes to see the world, this is a case where there are no cues or guidance and no prior knowledge in the brain; the light near the cradle, the swing bauble or fancy toy hanging above the baby's head makes their eyes peer at one of the targets. When we change the position of these targets, the baby's eyes can shift accordingly. That means that basic visual features will decide eye fixation.

The first ground for what area in a scene can attract the human gaze is from physiology and anatomy, the primary visual processing in our early visual area of the brain is composed of the retina, lateral geniculation nucleus (LGN) and visual cortex of the V1 area. A simple cell in the primary visual cortex only responds to stimuli in a restricted region called the receptive field (RF) of the visual field. There are many kinds of simple cells including orientation tuning cells, chromatic antagonism cells with red-green or yellow-blue, motion direction detecting cells and so on in our primary visual cortex, which extracts various features in the RF and discards useless information. So only significant features of objects such as colour, edges and motion in the input scene can be extracted and submitted to be further processed in the high-level brain. The research about this issue has been published in the relevant biological or physiological literature [6–8]. In Chapter 2 we will explain more about the visual pathways in physiology and anatomy in great detail.

Another ground for visual selective behaviour is from information theory [9–11]. Smooth and well-regulated areas are frequently neglected by our eyes, and the positions with maximum information or greater novelty are observed first. A very familiar environment (scene or image) that you stay in every day such as your home or office does not interest you since it is an old repeated surrounding or is an easily predicted environment in the scene. Someday if a bunch of fresh flowers appears on the desk in your office, then the flowers, representing a novel change, can attract your attention. Therefore, the locations with novelty in an image or a video are the eye-fixation areas because the novelty makes information of the location maximum, or causes surprise. Some statistical criteria that measure information or novelty have been proposed [9–11] to distinguish the fixation and non-fixation regions, such as high variance, high self-information or entropy, large distance between the posterior and prior probability density distributions, distinctive higher-order statistics and so on.

The contrast between centre and surrounding at a location also influences the attention focus [12, 13]. In the visual field, the prominent part or saliency area will first be interested. White sheep on a tract of meadow, or a black spider on a white background, are examples in which the target (sheep or spider) is different from its surroundings, so they stand out. If the sheep stays on a white background or a black spider crawls across a black background, the target (sheep or spider) will not be obvious because of the context. Contrast between centre against surrounding and the statistics for both centre and surrounding have been proposed as a measure for attention [12–14].

For the cases with task-orientated cues or guidance, the attention areas depend not only on features, information, context and so on in the observed region as mentioned above, but also on the subject's intention. In such cases, fixation regions will be different from those with no cues and guidance. In addition, prior knowledge also affects attention. An artist pays more attention to artwork, while a gardener viewing the same scene mainly focuses his attention on strange flowers because their respective fields of interest and background knowledge are different.

During the recent decades there have been substantial exploration and many hypotheses and criteria that attempt to predict what areas in a scene may be a human's eyes' fixated regions. However, up to now it is still an open issue because of the complexity of the biological visual system and the diversity of human knowledge and intentions.

1.1.3 Selective Attention in Visual Processing

Every day our visual system receives a huge amount of information from the surrounding world. An estimated data of the order of tens of megabytes is falling on the eyes' retinas per second [15, 16]. However, the retina picks up the information that is not evenly distributed. The centre of retina is called the fovea, and it has higher resolution of perception than places that are far from it. In general, people move their eyes to the interested region in a scene just to ensure that the prominent object in the scene is projected onto the fovea for examining it in detail. Those objects projected on areas other than the fovea are perceived with lower resolution and largely ignored for processing. In fact, 50% of the primary visual cortex is devoted processing the inputs from the centre (fovea) of the visual field [17, 18].

Also, the data processing capacity of the visual pathways in the brain is estimated to be only 40 bits per second [19, 20]. Input data with an order of tens megabytes are reduced through the retina fovea, feature extraction of the LGN and the primary cortex V1 in the low-level cortex, and then pass the cortex V2–V4 and V5 (middle temporal or MT) to the high-level cortex. Only very few data (i.e., very little target information) per second can reach the memory and be processed in the high-level cortex. Reducing information redundancy occurs not only in parallel feature extraction of a scene in the primary visual cortex, but also in serial target cognition [21] in all the visual pathways including the high-level cortex. Hence, large amounts of input data are effectively decreased. In ancient times, during the age of Aristotle, it was found that the high-level cortex cannot simultaneously recognize more than one target located in different positions when a scene or an image is viewed; that is, the limited resource in our brain restricts the information processing. The eyes have to be shifted from one prominent target to another according to the attention selection order. Even if only a single star's portrait exists in scene, the areas of the eyes and mouth of the portrait will be fixated more times or with longer interval. A female portrait and the track of an observer's eye saccades for the portrait in a cue-free case are shown in Figure 1.2. The parts of the eyes and mouth that include complex structure are frequently scanned, and the cheek areas without significant information are not scanned.

Figure 1.2 The track of eyes saccade when observing the face of a lady with no instruction for 1 minute from Yarbus (1967 Figure 115 [24]). With kind permission from Springer Science + Business Media: Eyes movements and vision, © 1967, Yarbus

img

Selective visual attention can solve the bottleneck of limited resource in the human visual system (HVS) [20, 22, 23]. Only a selected subset of visual inputs are allowed to reach high-level cortical processing. So, strictly speaking, selective visual attention is an ability that allocates processing resource in the brain to focus on important information of a scene. Just owing to the selective visual attention, people can effectively deal with a large number of images as their visual inputs without encountering information overflow while systematically handling many tasks. Selective visual attention plays an important role in biological signal processing. In the literature (as well as this book), the term ‘visual attention’, or ‘selective attention’, is used sometimes to refer to selective visual attention.

The studies of visual attention from physiology and psychology have developed for several decades. Biologists have been trying to understand the mechanism of processing perceptive signals in visual pathways and further understand the brain of humans or quadrumanes. Visual attention helps people to deal with a mass of input data easily, regardless of the input data being tens of megabytes per second, while in computer vision or robot vision, enormous input images often result in memory overflow. Hence, recently, many scientists and engineers who work in computer science, artificial intelligence, image processing and so on have been engaged in visual attention research that aims to construct computational models to simulate selective visual attention for engineering applications. Although the principle of selective attention has not been very clear biologically, and there are a lot of open issues that need to be explored, these computational models have found good applications in many engineering tasks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset