2.4 Binding Theory Based on Oscillatory Synchrony

As discussed in Section 2.1.1, search for an object with different characteristics (form, colour, orientation and motion) in the visual field often involves different visual areas in the brain cortex, from the retina, LGN, V1, to dorsal and ventral pathways, up to the high level cortex areas. How to bind together these activities distributed in different areas of the HVS to represent a target or an object is a fundamental step in visual pattern recognition. This is the so-called binding problem, that is the process responsible for linking distributed activities.

From the point of view of psychophysics and psychological science, as mentioned by Treisman group's FIT and Wolfe group's GS models, feature binding occurs due to spatial attention that provides the glue for the independently registered features of an object. In these models, integrating related features is carried out in the attention or pre-attention stage. The problem is that the relationships among features must be available for further processing since there are many possible combinations for the continuous scenes input from the retina. The question is how to separate one set of features from another. For example, a boy wearing a red cap is playing with a red ball in a scene. The locations of the two targets with similar form and the colour (the boy's head with red cap and the red ball) are very close; sometimes they are even overlapped or occluded by one another, but humans still can separate these two targets easily. In the visual field, multiple sets of features may be grouped simultaneously into multiple objects, and the possible feature combinations are infinite when the visual scene changes. Hence, it is not sufficient for a model to only depend on attention focus to explain this kind of feature binding for multiple objects in a varying visual scene.

In the view of neuroscience and neuron dynamics, the binding at neuronal level may work in different ways. A population of neurons related to an object in different visual areas fire synchronously that assemble a temporal group representing a current object in visual field. The typical temporal binding theory was firstly proposed by Malsburg in 1981and 1985 [32, 68], and by Malsburg and Schneider in 1986 [69]. The hypothesis suggests that the binding is dynamically implemented via synchrony of neuron activity. When different neurons corresponding to distinct features accept outside stimuli, the neurons belonging to the same object can oscillate synchronously by the coupling between neurons. The neurons in the structure group synchronously fire in different cortical areas representing a perceptual object at the current time. At a different time, the combinational set is different, so temporal binding can represent many objects with the limited resources in the brain. That means that differentiation between two separable objects at the same level of visual processing can be solved by two different synchronized neuronal assemblies.

A number of neurobiological experiments have confirmed the temporal binding hypothesis proposed by Malsburg. Gray and Singer published the research results of synchronous oscillation recording of single cells in a cat's visual cortex, which was shown in Society for Neuroscience Abstract of 1987 [70]; afterwards Gray et al. presented the results in Nature of 1989 [71]. The frequency of oscillatory responses in the cells of a cat's visual cortex was measured in the range 40–60 Hz. Gray et al. discovered an interesting experimental result, in which two neurons some distance apart in the cat's primary visual cortex responding to the same moving bar can generate frequency-locked oscillations; nevertheless, two relatively close neurons responding to separate moving bars do not generate such high activation. This means that synchronous oscillation is only produced between highly related neurons. Obviously, the results of Gray et al. are coincident with Malsburg's temporal binding hypothesis. After that, the synchronization of oscillatory response on neuronal activity was discovered not only in the brain of a cat [71, 72], but also in the brain of other mammals such as an awake macaque monkey with oscillatory frequency 30–90 Hz [73–76]. The synchronizations recorded by the neurons probably have two reasons: first, the spiking signal (cell's firing) conveyed from other neurons arriving at a neuron simultaneously will produce a large effect at the soma, and second, the oscillations may promote approximately at a resonance of 40 Hz [77].

The relation between temporal binding hypothesis and attention action at the level of single neurons was discovered in electrophysiological data from areas V4 and IT of a monkey's cortex [78] in 1985. When two different objects were located within a receptive field in the V4 area, the selected neurons belonging to an object attended by the monkey would respond more intensively than others. Another evident physiological experiment of single cells recorded in cortical area V4 of an awake macaque monkey showed [79] that the neurons activated by the attended stimulus increases gamma frequency synchronization (35–90 Hz) while the macaque attended to behavioural stimuli. A view proposed by Crick and Koch suggests that the selective visual attention at the single cell level is like a modulation of time structure of spiking train in the V1 area, via ‘temporal tagging' [77].

2.4.1 Models Based on Oscillatory Synchrony

It is not clear how synchronous oscillation works at the neuronal level, but models based on oscillatory correlation have been proposed in [75, 80–84]. Since synchronous oscillation is based on the cell level, all these models are composed of a two-dimensional (2D) neuron array. Each neuron is represented as an oscillator with an excitatory and an inhibitory unit [81–85] or a spiking generator with action potential [75, 80]. The exoteric stimuli – the image of the visual field – input the corresponding locations of the 2D oscillator or spiking generator array; that is, each neuron accepts a stimulus located at a pixel of the input image and the feedback outputs from other neurons and itself. The information conveyance in these models via connected weights between neurons forms a dynamic system. There are a lot of models related to synchronous oscillation, but this subsection introduces two kinds of neuronal oscillatory models related to temporal location and feature binding in order to account for the principle of synchronous oscillation.

1. Spiking neural model[75]
From biologic studies, cells in the brain are specialized for generating electrical signals in response to chemical and other input, and transmitting them to other cells. The typical response of a cell is a spiking train. In general, the electrical potential in a cell membrane is about −70 mV. When it accepts continuous positive input electrical signals that make the membrane potential ascend above a threshold level, the cell fires and generates an action potential (spiking). Note that the cell membrane acts like a capacitor to accumulate temporal signals from its input spiking train and spatial signals from other cells. After an action potential has just been fired, the cell has a refractory period during which it is more difficult to evoke another action potential. The refractory period only lasts for a few milliseconds, after which it can fire again if its membrane potential exceeds the threshold. In classical artificial neural networks, output of a neuron is an analogue value that simulates a cell's average firing rate for a fixed time interval without considering time coding in detail. In fact, temporal coding plays an important role in neuronal activation. The spiking neural model [75] or pulse-coupled neural network (PCNN) [80] models the temporal relation between cells, so the phenomena of synchronous oscillation can be described.
Figure 2.13 represents a spiking neuron model proposed by [75] which includes feeding inputs from the scene (bottom-left of the figure) and the inputs linking outputs of other neurons (upper-left). All the input signals are spiking trains. These pulse trains pass the respective capacitors with different time constants that simulate cell membrane capacitors. The contribution of each spiking is to charge the capacitor, and then discharge with an exponential curve (as shown in the square blocks with smooth corners in Figure 2.13) if no subsequent spiking pulse appears. The output of spatial and temporal accumulation from linking neurons is (L(t) + 1), which modulates the temporal accumulation of feeding input, I(t), where t represents the time. Accordingly, the membrane potential of the neuron at time t is defined as U(t) = I(t)(L(t) + 1), where U(t) denotes the temporal and spatial accumulation for all input spiking trains of the neuron. It is worth noting that the constant (+1) is necessary because when there is no feedback sparking from linking neurons, the feeding inputs can still contribute to U(t). The constant can take any value, but we use (+1) for simplicity. The right part (the block with dashed line) of Figure 2.13 is the pulse generator. A comparator with input U(t) and threshold ‘th1(t)' is applied to generate spiking signal as action potential of the cell. Here ‘th' is a time invariant threshold (original threshold). The original threshold is great than zero. When the neuron has not received any input signals (U(t) = 0), the threshold th1(t) = th and U(t) < th, so the output of the comparator is zero and no output of the pulse generator is produced. As the neuron accepts input signals, temporal and spatial accumulation of inputs makes U(t) increase. When U(t) exceeds the threshold th1(t) at time t, a high electrical level occurs in the output of the comparator and the pulse generator. In the same time, the high electrical level makes the threshold (th1(t) = th + th2(t)) rapidly increased, where th2(t) is the exponential descending function with regard to t. The high threshold is fed to the input of the comparator, resulting in a sharp decline at the comparator output. Consequently, a pulse (action potential) appears at the output of the generator. The pulse with amplified time axes, t1, is shown in Figure 2.13 (in the block with small size of pulse generator). Since the threshold (th + th2(t)) is very high, any inputs during this time interval cannot make the U(t) higher than the threshold, so there is a refractory period. The refractory period of the neuron depends on the time constant of the exponential descending function th2(t). When th2(t) decreases to near zero, the neuron can fire again. As time continues, a spiking train appears at the output of the sparking neuron as shown in Figure 2.13.
Suppose there are many spiking neurons, each one can be represented in Figure 2.13, combining a neuronal 2D-array, and the exoteric 2D stimuli input to the neurons located at the corresponding position. Let us first consider the case of only accepting input from a visual scene without feedback signals from the linking neurons. The value of each pixel of the input image can charge up the input capacitor of the neuron located at the corresponding position to make U(t) increase, then create a pulse when U(t) exceeds the threshold, th1(t). The firing time only depends upon the input value and the initial value in the capacitor. The time constant of the exponential descending function for the threshold decides the firing frequency or cycle. In this case, no synchronous pulse occurs except for some neurons that have the same input stimuli, time constant and initial values in their capacitors, so the stimulus for each neuron in the neuronal array can lead to it firing independently at a different time. Evidently, the neuron which accepts the strongest stimulus, can fire first. Next, the link between neurons is considered. When the output of the first firing neuron is conveyed to the inputs of its adjacent neurons by connected weights, and if the input stimuli of its neighbour neurons are near to that of the first firing neuron, the conveyed information from the first firing neuron raises the membrane potential of neighbour neurons, and this induces firing of these neighbour neurons at the same time as the first firing neuron. Since an object in the visual field often is more salient than its background, the first firing neurons, in general, correspond to the position of that object. Consequently the reciprocity between neurons will result in synchronous oscillation of neurons in the same object location. After neurons have fired for the same object, the neuronal refractory period will prevent these neurons that have fired (located on the object) from firing again. The neuron in the next salient position fires due to the membrane potential of the iterated integral up to the threshold. If the next salient position includes another object, synchronous oscillation occurs for the next object. The synchronous firing continues one by one. When the refractory period of these neurons related to first firing object ends, these sparking neurons related to the first object will fire again if the input stimuli still continue. The models of the neuronal level can generate attention focus by synchronous oscillation or synchronous firing. Inhibition of return can also be implemented by refractory period of neuronal firing. These models have more biological plausibility.
The spiking model can simulate the activation of cells by linking other cells. In the previous simple example, only the intensity feature is considered. So nearby cells with similar intensity related to an object can be bound together. If we consider binding several features that need to combine several 2D sparking neural networks, the linking between cells in several 2D sparking neural networks may cause the feature binding.
2. Oscillator model with an excitatory and an inhibitory unit
The sparking of neurons concerns the inner structure within one neuron that is somewhat complex when many neurons with several features exist in the system. The oscillator model has another way of generating synchronous oscillation. Although the oscillator model is also a neuronal array or neuronal assembly, the inner details of each neuron are no longer considered, and the dynamic behaviours are accented. Each neuron in the model is an oscillator with excitatory and inhibitory couple units. The task of the excitatory unit is to receive the input stimuli, and the inhibitory unit suppresses the excitatory signal via their inner connections. In general, the oscillator can oscillate at a fixed frequency under a given condition. Each oscillator encodes a specific feature or pixel of an object. The connections between oscillators convey the information of each oscillator to the others, and this results in synchronous oscillation for the same object. Most of the oscillator models are applied to object segmentation in images. One object in a visual field is represented by a group of oscillators with synchronized oscillation or phase locking. For different objects, there is desynchronization between different groups of oscillators. In the early oscillator models proposed by Malsburg [81], several neuronal arrays (each array consisting of many oscillators), representing different features respectively, are arranged in a system. The synchronization between oscillators depends on complex connections (full connections within each array and to full connections between the oscillators in different arrays), but these long-range connections in different arrays sometimes lead to mistakes (e.g., false object segmentation). In fact, local connections also achieve synchronous oscillation (no need for complex connections). Various local connection models based on oscillators were proposed in [82–85]. Figure 2.14 shows a general oscillator model with weights within each oscillator and the linking weights between excitatory units and the linking weights between inhibitory units. For each oscillator (i.e., an ellipse) in Figure 2.14, a circle with a positive symbol denotes an excitatory unit and a circle with a negative symbol is an inhibitor. In each oscillator, excitatory and inhibitory units are connected to each other to form a dynamic mini-system that can generate oscillation under certain conditions. Regardless of the intra-connection in an oscillator, W and J (Figure 2.14) are two sparse connection matrices representing the connections between excitatory units and between inhibitory units, respectively. The sparse coefficients in the matrices mean that the connections of a neuron are only for its neighbours.
Although most oscillatory models are basically similar, they still have some differences. Two typical models, namely pixel binding and feature binding, will be introduced here.
a. Pixel binding model LEGION (locally excitatory globally inhibitory oscillator networks) [83, 86]
Consider a 2D oscillator array with local connections as in Figure 2.15, in which hollow circles in the array denote neurons (each neuron is an oscillator depicted in Figure 2.14) and the black circle is the common global inhibitor. When a group of oscillators representing an object oscillates in phase-loking or synchronously, the global inhibitor can generate a strong inhibitory signal to suppress oscillators which results in desynchronization. Note that Figure 2.14 is a more detailed linking sketch map. In LEGION, the inhibitory connections between inhibitors are set to zero, J = 0, and keep the local connections between each excitatory unit and its adjacent units. The excitatory unit of each oscillator accepts the greyscale value from a pixel of the input image at the corresponding location. In an oscillator, there is a feedback loop between the excitatory unit, xe, denoted with the symbol ‘+' in Figure 2.14, and the inhibitor unit, yI, denoted with the symbol ‘' in Figure 2.14. The feedback loop forms a mini dynamic system that satisfies a pair of dynamical equations as follows:

(2.2a) equation

(2.2b) equation

where In is the external input signal to the oscillator, L represents the signals from other neurons, which are positive signals from its adjacent excitatory units and negative signal from a global inhibitor, and the constants, ε, γ and β, control oscillation speed and phase. When the gradient of xe equals zero in Equation 2.2a we can obtain a cubic curve (yI ~ xe) and the gradient of yI equals zero in Equation 2.2b to get a curve of the sigmoid function (yI ~ xe). Both curve functions are described as

(2.3) equation

The equilibrium point of the mini-dynamic system is the intersection of the two curves in Equation 2.3. In the case of L = 0, when In > 0 the equilibrium point is located at the middle branch of the cubic, the mini-dynamic system has a periodic solution that is in the oscillating state. It confirms that different phases can be produced for dissimilar In in the case of periodic oscillation. The mini-system is designed so that when (L + In) is less than or equal to zero the cubic curve moves downward (yI decreases) and the intersection of the two curves (equilibrium point) moves down at the x-axis (yI = 0). In that case, the mini-dynamic system is staying in a stable state with no oscillation.
When there are no input stimuli, that is In = 0 and L = 0, all oscillators stay in the non-oscillation state. When external stimuli enter the LEGION model, the oscillator groups representing multiple objects start to oscillate, and the oscillators located at the same object generate oscillations with close phases. The interplay between neurons by local connections (L > 0) for a group makes oscillators in the group tend towards phase-locking or synchronization. When the oscillating amplitude of a group simultaneously reaches maximum, they will inhibit other groups via the global inhibitor, and this leads to desynchronizing for different objects. The LEGION model is a simple model to be used for object segmentation by synchronous oscillation. It considers a common inhibitory signal to shift one object to another.
b. Features binding model in primary visual cortex [84]
Considering the case where an oscillator represents a feature, without loss of generality it represents one orientation feature in a small input area as a simple cell in the primary visual cortex that prefers a specific orientation in its receptive field (RF); the oscillator model is a 3D assembly; 2D- visual stimuli are related to the location of a 2D oscillator array. At each location corresponding to a small area (RF) of input visual field, there are k oscillators that form a column representing k orientation features (each oscillator extracts a special orientation feature in the RF); that is the input is the same for the k oscillators in a column. There is no linking between the k oscillators within a column. If there exists an orientation edge segment iαc in the RF of location i, the oscillator representing αc excites and outputs a response signal. The k orientations are evenly distributed on a whole circle with a span of 180° and the oscillators' RFs on adjacent locations overlap each other.
The interior structure of an oscillator is the same as in Figure 2.14, and each oscillator satisfies a mini-dynamic equation as with LEGION. However, the outside connections in the feature binding model have some differences. (1) There is a common input signal to all inhibitors, which controls the timing of the mini-dynamic system. (2) The excitatory unit at location i will receive not only the input from its RF but also the feedback signal from the output itself and the excitatory outputs from other locations via excitatory connections (i.e., elements of excitatory matrix W in Figure 2.14) if a smooth or continuous edge exists in the RF and its adjacent RFs. For discontinuous edges between the RF and adjacent RFs, the excited signal is sent to the inhibitory units of its adjacent oscillators via the inhibitory connections (elements of matrix J in Figure 2.14). Here elements of the matrix W and J will model the synaptic strengths of horizontal cortical connections. For example, element W(1, 2) denotes the excitatory connection between feature α1 in location i and feature α2 in location j, where ij. The connection W(iα1, 2) is the function of edge curvature on their RFs and the distance between them. If one finds a smooth or small curvature contour to connect (1) and (2), the connection W(1, 2) will be strong, and it generally decreases with increasing curvature of contour or distance. (3) For elements of inhibitory matrix J, it is the other way round: for large curvature contour to connect (1) and (2), the inhibitory connection J(1, 2) is strong. (4) Both connection types are a function of distance, and converge to zero with the increase of distance. More details of the model can be found in [84].
Feature binding is processed as follows. While a visual scene inputs to the model, the oscillators at the locations with edge orientations in the scene are excited in parallel. Since the contours of most objects are smooth or have small curvature and form a closed structure, only the oscillators on contours of objects can oscillate synchronously or keep phase-locked as aided by the linking between excitatory and inhibitory connections. Therefore, objects in the visual field pop out and the noise or background with desultory contours is suppressed after some transient period.
This model is based on the neuronal level and is consistent with the V1 anatomy. Specially, the contextual influence between cells is considered and the contours of the object are enhanced. Initially, the model is applied to contour enhancement and texture segmentation, and then the model with normalized process can easily explain visual attention phenomena because of the contextual relation between neurons, and this will be introduced in the next subsection.

Figure 2.13 Spiking neuron model proposed by Eckhorn [75]. © 1999 IEEE. Reprinted, with permission, from R. Eckhorn, ‘Neural mechanisms of scene segmentation: recordings from the visual cortex suggest basic circuits for linking field models’, IEEE Transactions on Neural Networks, May 1999

img

Figure 2.14 A general oscillator model [84]. Adapted from Li Z.P.(1998) A neural model of contour integration in the primary visual cortex, Neural Computation, 10(4), 903–940

img

Figure 2.15 Neuronal array with common inhibitor [83]. © 1995 IEEE. Reprinted, with permission, from D. Wang, D. Terman, ‘Locally excitatory globally inhibitory oscillator networks’, IEEE Transactions on Neural Networks, Jan 1995

img

2.4.2 Visual Attention of Neuronal Oscillatory Model

How is the attention represented in the neuronal oscillatory system? The first point of view is based on physiology. That is, visual stimuli are generally encoded by the mean firing rates for neurons or neural populations, and each neuron represents a specific feature (e.g., orientation-tuned cells in the primary visual cortex V1 represent various orientation features). When a stimulus is in line with the preferred feature of a neuron, the neuron fires at the maximum rate [87]. However, it is not sufficient that we only consider responses to preferred neurons since some preferred neurons may be not located at the attentional region. Therefore, the structure of the neuronal population and reciprocity between neurons induce the combined activity of neuronal groups which present higher mean firing rate at the attended region than unattended ones under the same stimulus input [78, 88]. In this view, the primary visual cortex can provide the saliency map according to firing rates of V1's output neurons [89].

The second view proposed by [90, 91] assumes that attended stimuli may be distinguished from unattended stimuli by a form of ‘temporal tagging'. The activity of all tagged neurons responding to attended stimuli consists of synchronized oscillations or phase-locking. On the other hand, unattended stimuli result in less-organized neuronal firing or reduce the activity. The attention therefore modulates the temporal structure of the neuronal activity at the level of primary visual cortex in response to visual stimuli. It is noteworthy that, in this view, attentional modulation only affects the temporal structure of firing pulses in the V1 area, not its mean firing rate. Thus, temporal tagging by synchronized oscillations naturally separates attentive and un-attentive stimuli.

The third point of view suggests that the contextual influence leads to attention in aid of inter-oscillators and horizontal intra-cortical connections in the primary visual area [89]. It is easy to make sense from the feature binding model described in Section 2.4.1(b) (Figure 2.14), that each oscillator represents a simple cell with preferred orientation in the V1 area and each cell's activity depends on both its RF input and contextual stimuli from other cells. The model mentioned above has been used to enhance the contour of an object in a scene if a small curvature edge presents at adjacent overlapped receptive fields, because the excitatory horizontal connections along a smooth contour give higher enhancement than the inhibitory connections. In the same model, when every cell is surrounded by cells with the same orientation and each cell only receives an orientation stimulus within its RF, it will be suppressed by overall inhibitory connections more strongly than the single excitatory connection [92]. An example input pattern is given in Figure 2.16, in which 45° and 135° bars are evenly arranged along the two sides of the pattern in Figure 2.16(a). Note that a cell can only receive an input bar in its RF, and the input strength of each bar is determined by its contrast. In homogeneous regions (both left and right sides), the activity of cells is reduced due to iso-feature suppression, and the cells near the region boundary exhibit stronger response since they lack a complete iso-orientation surrounding and are less suppressed. The output of the model is represented in Figure 2.16(b): the thicker bars denote salient activity. The contextual influence in the neuronal oscillatory model has more advantages compared to the FIT. In the case of Figure 2.16(a), the FIT has difficulty distinguishing the boundary because there are no feature dimensions that have unique salient target in the FIT (in the pre-attention stage) or no attention focus to bind two different features (in the attention stage).

Figure 2.16 An example of contextual influence in neuronal oscillatory model [92]. Reprinted from Neural Networks, 19, no. 2, Li Zhaoping, Peter Dayan, ‘Pre-attentive visual selection’, 1437–1439, 2006, with permission from Elsevier

img

Except for the spatial contextual influence on attention, for video input, the timing contextual influence has also been found in visual search: if the target has the same features or appears in the same location as on the previous trial, search would be faster. The repetition effects were found by [93, 94]. Some psychological experiments, which only depend on contextual relation in timing, evinced a more efficient visual search without top-down and bottom-up guidance [95, 96].

In summary, the model of oscillatory synchronization is for feature binding, and it is different from the FIT in the following aspects. (1) FIT is based on biological concepts in which feature coding in multiple feature dimensions is processed in parallel across the whole visual field and then the feature binding depends on focal attention. The oscillatory model is based on the neuronal level as the neurons that represent various features are connected to each other. The timing and phase of their firing form a perceptual organization coding. (2) FIT uses focal attention to bind spatial features related to an object, while the oscillatory synchronization model is temporally bound into a coherent object representation. (3) The oscillatory model considers the contextual relation between neurons which is more coincident with anatomy in the V1 area.

It is believed that conceptual attention (glue activation) and the attention based on synchronized oscillating can be combined together.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset