5

Computational Models for Top-down Visual Attention

The computational models of visual attention introduced in Chapters 03 and 04 mainly simulate pure bottom-up attention. However, in practice, the human visual system hardly works without top-down visual attention, especially while searching a target in a scene. For example, your five-year-old son vanishes from your view in a public park, and you search for him anxiously according to the prior knowledge in your brain: his clothes and the way he walks and so on, which are related to the top-down attention. If his clothes (e.g., red jacket) is conspicuous compared with the environment (green lawn or shrubs), you only need to search the salient areas from bottom-up attention mechanism (candidate regions in red colour pop out from the background in green colour). And then find your son by using your top-down knowledge, which speeds up your search process since you do not need to search all the places in the scene. However, when your son does not pop out from the environment, in other words, when the candidate salient locations from the bottom-up attention mechanism do not indicate your son, the top-down attention becomes even more critical after fast scanning these candidate regions. In human behaviour, bottom-up and top-down attentions are intertwined, that is, overall visual attention is the interaction of both bottom-up and top-down attentions. Hence, all existing top-down computational models are combined with bottom-up activation to predict eye movement in overt attention.

Although the importance of top-down attention is well known, computational models for top-down attention are fewer than those for pure bottom-up attention because there is still a lack of real understanding on how the prior knowledge is obtained, represented and stored in the human brain and how it influences the process of visual attention. Such a fact results in the difficulty in formulating a computational model. Another problem is that the top-down attention depends on a specific task, so the computational models may be different for different tasks.

No matter how different the top-down computational models, the following four aspects may need to be considered: (1) the knowledge acquiring and learning related to a task from the outside world or from a subject's desire; (2) the representation of knowledge in computational model; (3) the storage of the prior knowledge; (4) the combination of top-down and bottom-up attention parts, which is a big challenge, such as how to provide coherent control signals for attention focus, and which stage of the bottom-up processing is influenced by the top-down information and so on. Since in most top-down models, the bottom-up attention part often employs the BS computational model or its variations which were introduced in Chapter 3, in this chapter we focus on the top-down part.

There are neurobiological and psychophysical evidences that a top-down mechanism exists in the human brain for visual processing [1–4]. The computational models combining object recognition and attention have shown that top-down cues are necessary for object recognition [5]. The earliest computational top-down model is the guided search structure (GS2 model) proposed by Wolfe in 1994 [6] as already presented in Chapter 2. Another early top-down model proposed by Tsotsos et al. [7] is a hierarchical system with several different types of computing units that are arranged in a pyramid. A new winner-takes-all (WTA) updating rule is presented in the model and this matches better with current related knowledge. The task-related influence is achieved through inhibiting units unrelated to the task, so that the attention signals related to the task can pass the units without any interference to high level processing of human brain. Other models that related to top-down cues were reported in [8–11]. In 2000, Hamker proposed a top-down visual attention model with parallel distributed competition to simulate the human brain's relevant behaviour [11].

From 2000, there have been a number of top-down computation models proposed [12-20], in which top-down influences are incorporated in different bottom-up stages [14, 15, 19] or embedded in observed data processing [12].

Some top-down models are connected with memory, representation and learning of prior knowledge. Many biological experiments have shown that knowledge related to the current task is probably stored in working memory [21–26]. The working memory in the brain serves the current work by keeping a small amount of information in mind for a short period of time. In a sense, working memory is like a short-term memory when the memory does not continue to be enhanced, the short-term memory will decay unless the memory is refreshed. The knowledge being refreshed in the short-term memory gradually changes to being stored in long-term memory. A lot of evidence has shown that working memory plays an important role in top-down attention guidance [21–26]. Top-down computational models including working memory or short-term memory are proposed in [12, 14, 15, 18]. In some top-down computational models, the knowledge (or feature) is stored in a decision tree that can be updated through learning rules and can be rapidly retrieved [14, 15, 27]. In [15], an amnesic and incremental learning decision tree is used for robot vision, which simulates short-term and long-term memory of the brain to avoid expansion of the decision tree in learning processing. The Visual Object detection with CompUtational attention System (VOCUS) for object detection is a simple and useful computational model in which top-down knowledge (the weight of each conspicuity map) is learned beforehand, and both enhancing the information related to the object and inhibiting unrelated regions are considered [16, 17].

A hybrid bottom-up and top-down attention model is proposed in [18] by using a fuzzy topology adaptive theory(ART) neural network with top-down memory, and also including both enhancing and inhibiting controls to salient areas. Recently some new top-down computational models have been developed, such as the SUN top-down model with natural statistics [19], the top-down model guided with statistical characteristics of orientation features [20] and so on.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset