2.5 Competition, Normalization and Whitening

The model of synchronous oscillation simulates the internal activity of the neuron and considers contextual relation between neurons, which is more biologically plausible. However, it is difficult for engineering applications due to the structural complexity of a single neuron, where a neuron is a pulse generator. Actually, regardless of the inner workings of a cell, only from physiological and anatomical data, cells' activity in the HVS can exhibit various phenomena and behaviours that represent more contextual relations. For instance, competition, normalization and whitening are just the behaviours between cells. Competition exists everywhere between neurons in separate visual areas of the brain or between objects in the visual field. As a result of this, the preponderant object in a visual field or cells in the visual areas can win over others and stand out. Normalization of the cells' response is another behaviour caused by lateral inhibition of cells with the same characteristics. This suppression of cells with the same feature can pop out larruping features or cells. Frequency whitening because of RF centre–surround properties of a retinal ganglion cell also possibly leads to visual attention. The rest of this section introduces the viewpoints based on some recorded data of a single cell in animals or the data from functional brain imaging studies.

2.5.1 Competition and Visual Attention

As discussed in Chapter 1 and earlier in this chapter, visual attention may be interpreted as competition between many different visual features and between objects in the visual scene due to the capacity limitation for processing multiple objects in the HVS at any given moment [97, 98]. In the GS model, the activation map, after feature integration, is a winner-take-all network in which all the units on the activation map compete with each other and the winner unit or object is assigned as the focus of visual attention.

How do we process multiple objects throughout the visual field with the limited resources of our visual system? Let us review the pathway of information processing in the HVS as mentioned in Section 2.1. The related areas in the HVS, beginning with the primary visual area V1, are organized as two major streams: ventral and dorsal, for objects' form and motion processing respectively. There are two characteristics for information processing from low-level areas to higher-level areas: one is that the complexity of visual processing increases with the increase in processing level; the other is that the RF size of individual cells increases with hierarchical elevation. The RFs can be regarded as the critical visual processing resources. Objects located in the retina is first processed in parallel by small-size receptive fields of the V1 area; then, in higher areas of the brain, more and more objects may be added to an RF with a large size. The following studies on monkeys validate the existence of competition, and the competition between objects within an RF with larger size will weaken the information of each object by their suppressing each other [78, 99, 100]. Let us consider a cell's processing with a large RF size, such as over the V2 area, where two objects possibly fall on the same cell's RF. The competition phenomenon has been found in the single cell recording data from monkeys [100]: while a single stimulus is presented alone to a cell's RF, the response of the cell has a high firing rate, and when the second poor stimulus is presented simultaneously within the same RF, the response to the paired stimuli is not enhanced (in fact, the response is reduced). The response seems to be a weighted average of the responses to the individual stimuli presented alone. The result means that two stimuli present at the same time in the same cell's RF are processed in a mutually suppressive way. This suppression between multiple objects represents the competition among them. A numbers of studies have validated that this phenomenon of competition exists in many visual areas of the brain [78, 100–102]. In addition, except for single cell recording of animals, the competition activity in the human brain has been also observed by functional MRI [103, 104]. It is noticed that, although competition takes place in many visual areas of the brain when multiple objects appear on the retina, the integration of different areas results in a winning object that is the attention focus. That means, the competition in different areas of visual pathways converges to work on the dominant object related to current behaviours (the attentional object) and suppresses the responses of ignored objects (non-attentional objects) [105].

On the other hand, competition can also be directed by the relevant object attention; that is, the attended features or object can gain bias to participate in competition. Some studies show that the biasing signal from attention can modulate the neuronal processing in the visual cortex [98, 104]. Several other studies have reported that a cell's response sensitivity is enhanced when the stimulus of bottom-up or top-down attention is presented within its RF. As an example of bottom-up modulation, the single red bar among the multiple green bars (distractors), as in Figure 2.4(a), is quickly detected because its saliency in the display favours the red bar. And, in the experiment with a macaque, the V4 cell responds to an attended stimulus as if its contrast or its saliency had been increased [106]. In addition, top-down modulation is more obvious in both single cell recording and functional MRI experiments. Under the same stimulus input, when an animal directs its attention to the location of a stimulus, the neuronal responses to the stimulus in its RF are enhanced compared with those neurons to which the attention is not directed [98]. Furthermore, as described above, two objects falling on the same RF compete with each other, and this will produce mutual suppression and only one object will be the winner. In the other case, when the top-down attention directs to an object, the competition results may change due to the top-down modulation to strengthen the ability of the object. So top-down selective attention can bias the competition and even override the bottom-up input, and this phenomenon is named the filtering of unwanted information [98, 104]. As will be seen in later chapters, competition has been widely applied to many computational attention models.

2.5.2 Normalization in Primary Visual Cortex

The studies in the primary visual cortex (V1 area) have suggested that the response of simple V1 cells depends on a weighted sum of the light intensities falling on their RF within a time interval [2, 107]. However, some experiments have revealed that the linear summation was not always tenable. When the cell's RF receives high contrast input, the response amplitude of the cell exhibits saturation. Doubling the input contrast cannot produce twice the value of the cell's action potential [108]. Considering the non-linear response, a normalization model has been proposed by [109–111], in which the linear response of each cell is divided by the activity from a large number of cortical cells. Since the division suppresses the increasing stimulus contrast, the non-linear saturation relationship is presented. The model has two steps: initially, the response of a cell satisfies a linear function, and then its response is divided by the activation of all the cells, in a normalization pool. The normalization model of one cell has been implemented by an electronic circuit proposed by [111] and the results of this model have provided a good fit to the neural responses of monkey primary visual cortex, but here we will not go into great detail about this issue. The reader can go to [111] for the detailed information.

The key problem here is what is the relationship between normalization and attention? First, let us consider bottom-up attention. Since normalization is related to the inhibition effect in the normalization pool in the V1 area, high activity in the normalization pool will give stronger inhibition than low activity. If the normalization pool is defined as the activity of a group of cells that represent a special feature, for instance, a colour or a special orientation, the normalization of a cell's responses will reflect the inhibition among cells with the same feature. This kind of normalization is referred to as division normalization, and it is similar to iso-feature suppression mentioned in Section 2.4.2. As an example, Figures 2.4(a) and (b) can be expressed by the division normalization. In Figure 2.4(a), a red bar (target) among many green bars (detractors) can be quickly detected at the pre-attention stage. If the feature colour red and the feature colour green respectively belong to two different normalization pools, the activity of the green pool (the summation of all the green bar responses) is stronger than that of the red pool. After normalization, the responses of all the green bars are diminished, and the red bar's response keeps to one because it is divided by itself. The same explanation can be applied to Figure 2.4(b) too, with normalization pools for the given input being defined as orientation features 45° and 135° respectively.

Considering the top-down attention modulation from high level and the normalization together, a normalization model of attention is proposed in [110] as shown in Figure 2.17, which includes a stimulus map on the left, the attention field at the top and the suppressive drive map (normalization) at the bottom. For simplicity, only two vertically orientated grating stimuli with the same contrast are presented in the two halves of the visual field. A circle at the right of the stimulus input denotes the RF of a neuron with optimal vertical orientation and it is selected by top-down attention. The middle image is the stimulus drive map for neurons with different RF centres and orientation preferences without attention modulation and suppressive drive. For simplicity, only two receptive field centres for the vertical orientation exist in the stimulus drive map. Note that brightness on all greyscale images denotes the strength. In the stimulus drive map two bright positions represent the stimulus responses of neurons in the location and the feature respectively. The attention field with greyscale value is the top-down attention bias, where mid-grey denotes one and brighter than mid-grey denotes the value greater than one. The attention field map modulates the stimulus drive map by applying point by point multiplication. The suppressive drive is computed from the product of stimulus drive and attention field. The final output after normalization is shown on the right-hand side of Figure 2.17, computed by dividing the product by the average over the suppressive drive. The final response map in Figure 2.17 shows that the firing rate of the attended neuron (right) is higher than the left one.

Figure 2.17 A normalization model of attention [110]. (1) Reprinted from Neuron, 61, no. 2, John H. Reynolds and David J. Heeger, ‘The Normalization Model of Attention’, 168–185, 2009, with permission from Elsevier. (2) Reprinted from Neuron, 31, no. 4, Preeti Verghese, ‘Visual Search and Attention’, 523–535, 2001, with permission from Elsevier (Cell Press)

img

This normalization model can explain the non-linear response. When a cell has double input, it will be divided by the suppressive drive, so it cannot get double response. Besides, the normalization also can explain a competitive phenomenon: when two equal stimuli fall on an RF their responses are reduced because the suppressive drive increases.

Normalization computation is very simple and effective, so it is employed by many computational attention models in practice.

2.5.3 Whitening in Retina Processing

As presented in Section 2.1.3, the RF of ganglion cells of the retina is a centre–surround opponent structure, to pop out edges of an image falling on their RFs. The centre–surround RF as a filter kernel can be described as following difference-of-Gaussian (DoG) function R(x,y) [112]:

(2.4) equation

where the constant C1 determines the amplitude of the centre Gaussian function, the constant C2 is related to the amplitude of the surround Gaussian function, and σ1 and σ2 are the variances of the centre and surround regions, respectively; the coordinate (x, y) covers a local central symmetric area (the RF of a ganglion cell) and the central coordinate is (x, y) = (0,0). Equation 2.4 is a spatial expression of the ganglion cell RF. The results of the convolution between the DoG function and the input image represent the output of ganglion cells. In Equation 2.4, the DoG kernel function is a band-pass filter as its property can be verified clearly by analysing the kernel function in the frequency domain. Different parameters and different sizes of RF correspond to different bandwidths of these filters. The collective and statistical result of these ganglion cells exhibits the spatial frequency response for human eyes. Of course, except for spatial frequency response, in the HVS there is also the temporal frequency response or spatiotemporal frequency response that is not considered here. Studies have demonstrated that spatial frequency response is biased to high frequency [113, 114].

An additional finding in 1987 [115, 116] is that the spatial frequency amplitude spectrum of natural scenes with strong spatial correlation approximates to a one over frequency (1/f) descending profile or its power spectrum falls as 1/f2, statistically. The cause of this is that our visual environment is highly structured [117] and some correlation exists in it, so the spectrum of natural scenes does not appear as a flat spectrum as white noise cases do. Thereby, the sensitivity of retinal ganglion cells replenishes the decline of the spatial frequency spectrum in natural scenes, resulting in a whitened response spectrum. The whitening theory or response equalization refers to a roughly uniform activity of all ganglion neurons in the presence of a natural scene. In other words, neurons tuned to high frequency should have higher sensitivity than that of the neurons tuned to low frequency.

A study of quantitative calculation for the sensitivity of ganglion cells has also been carried out based on the data from [118]. The measured responses of ganglion cells across the retina in macaques, when presented with gratings of different frequencies, have been found by [113]. In the research, the vector length [119, 120], the L2 norm of the cell's sensitivity profile, was used to estimate cells' responses by DoG function for P-cells. The results have suggested that the vector length increases with frequency, and the flat spectrum function, which is the response to cells with different size of receptive fields, is obtained when a natural scene is presented on the retina. One problem appears here: the noise with a flat spectrum in the scene would likely be amplified if the gain of ganglion cells tuned to high frequency increases. However, in fact, at low luminance (low signal-to-noise ratio) ganglion cells lose their inhibitory surround and band-pass filters degenerate to low-pass filters (and this increases the signal-to-noise ratio) [117].

Since the locations at high contrast of the input image reflect the object's edges that have many high frequency components compared with smooth places, the whitening property of ganglion cells can enhance the sensitivity at these prominent places, and it is necessary to use pre-attention. For the same reason, the places in input scene with complex objects can often attract more attention (eye fixation) than simple, dull places.

That means that centre–surround RFs and whitening in early retinal processing have filtered more useless information in the input scene before feature extraction in the V1 area.

In summary, competition, normalization and whitening are properties in visual attention processing. In the pre-attention stage, whitening of retina ganglion cells and normalization of the primary visual cortex enhances useful information such as the edges of objects and extraordinary items before feature extraction. Both of them process information in parallel and are driven only by input stimuli. Competition works in the higher visual cortex in the attention stage, which is related to both top-down attention and bottom-up attention. The winner after competition will be further processed in the post-attention stage in order to control human or animal behaviours according to the relevant information. It will be seen in Chapter 3 and 4 that many attention computational models adopt these properties stated above.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset