,

2

Communication

images

This chapter focuses on the fundamentals of communication in event-based neuromorphic electronic systems. Overall considerations on requirements for communication and circuit- versus packet-switched systems are followed by an introduction to Address-Event Representation (AER), asynchronous handshake protocols, address encoders, and address decoders. There follows a section on considerations regarding trade-offs in the design of AER links, and a section describing the details of the implementation of such links and how these have evolved.

2.1   Introduction

In evolution, the great expansion of computational power of brains seems to be implemented by expansion of cortex, a sheet of tissue surrounding older structures. Cortex is a layered structure divided into what is known as the gray matter and white matter. Figure 2.1 shows a cross section of a small chunk of visual cortex of a cat brain. The many types of neurons, of which there are about 105/mm3, make long-range connections through axons (their output branches) in the white matter (with a wiring density of 9 m of axon per mm3) while the gray matter is composed mostly of dendrites (the neurons’ input branches) with a wiring density of an amazing 4 km/mm3 (Braitenberg and Schüz 1991). The long-range white matter connections occupy much more volume because they are myelinated, that is, thickly sheathed in a material called myelin. The myelin acts as an insulator, reducing the capacitance and increasing the resistance to the outside of the cell to reduce decay of the action potential impulses by which the neurons communicate, as these impulses travel along the axons.

images

Figure 2.1   Cross-sectional view of layered cortical cells with their dendrites within the gray matter and axons projecting out to white matter. Only a few cells are shown, but gray matter is completely filled with neurons and some other supporting cells. Adapted from Binzegger et al. (2004). Reproduced with permission of the Society for Neuroscience

Action potentials, or so-called spikes from the shapes of their waveforms, whether traveling along axons in white matter or unmyelinated axons in gray matter are stereotypical (Gerstner and Kistler 2002). Although their amplitudes, durations (around 1 ms), and precise shapes can vary somewhat, they can be treated as all-or-none, essentially digital events.

Neuromorphic electronic systems must embed complex networks of neurons, axons, and synapses, which nature builds in three dimensions (3D), into an essentially 2D silicon substrate. Unlike standard digital logic where the output of a gate, on average, is connected to the input of three to four other gates, a neuron typically delivers a spike to thousands of destinations. Hence there is a fundamental physical mismatch between the logic-optimized, 2D silicon technology and the interconnectivity requirements for implementing biological networks of neurons. This mismatch is overcome using time-multiplexed communication.

Modern digital gates have switching delays that are on the order of tens of picoseconds, which is many orders of magnitude faster than the time constant of the output spiking activity in a neuron. Since the communication fabric is only carrying spikes from one neuron to another, the high connectivity problem as well as the 3D/2D mismatch can be resolved by using a time-multiplexed fabric for interneuron communication. Such a network is very similar to a packet-switched communication network used in on-chip, chip-to-chip, and system-to-system communication (e.g., the Internet) today.

Communication networks can be broadly divided into two major categories: circuit switched and packet switched. In a circuit-switched network, two end points communicate by setting up a virtual circuit – a path through the network that is dedicated for the communication between the two end points. Once the path is setup, data can be communicated over that path. Once the communication ends, the virtual circuit is destroyed and the hardware resources used by the circuit are released for use by another virtual circuit. This approach was used in the original telephone network. Communication in a circuit-switched network has a setup/teardown cost for the virtual circuit, after which the dedicated communication circuit can be used with very low overhead. Such networks are efficient when the communication between two end points occurs for a very long duration.

Packet-switched networks, on the other hand, operate by time-multiplexing individual segments of the network. Instead of creating an end-to-end path up front, each item being communicated (a packet) requests access to shared resources on the fly. Hence, each packet must contain sufficient information that allows each step in the communication network to determine the appropriate next step in the path taken by the packet. Large messages in packet-switched networks are typically packetized – converted into a sequence of packets, where each packet contains replicated path information. However, there is no overhead in sending the first packet through the network. Such networks are efficient when the communication between two end points is in bursts of small amounts of information.

A neuron spiking event is only used to convey a small amount of information in typical neuromorphic electronic systems. In the extreme case, the only information conveyed by a spike is the fact that the spike occurred at all – that is, the time at which the spike occurred relative to other spikes in the system. A very sophisticated model might attempt to model the spike waveform directly, and convey a certain number of bits required to reconstruct the waveform to a certain degree of precision. Most large-scale neuromorphic electronic systems model spikes with a small number of bits of precision. Hence interneuron communication is typically implemented using packet-switched networks as they use hardware resources more effectively when small amounts of information are exchanged.

Representation of time is a nontrivial task, as temporal resolution is an important factor in the design of communication networks for spiking neurons. There are two major approaches to representing time. (i) Discrete time: In this approach, time is discretized into global ticks, and a communication network is designed to deliver time-stamped spikes at the appropriate global time step. (ii) Continuous time: In this approach, the entire system operates in continuous time, and spike delays are also modeled with continuous time electronics. This is a challenging design problem, and practitioners of this approach typically make use of the following set of observations:

  • Neuron spiking rates are very low (tens of Hz) compared to the speed of digital electronics (GHz). This means that a fast communication fabric operating in the tens or hundreds of MHz regime would be idle almost all the time.
  • Axonal delays are also on the order of milliseconds compared to the switching delay of gates (tens of picoseconds).
  • A very small (<0.1%) variation in spike arrival time should not have a significant impact on overall system behavior, because biological systems are known to be very robust and should be able to adapt to a small variation in spike arrival time.

Combining these three observations leads to the conclusion that we can ignore the uncertainty in spike delivery time if it can be kept in the order of microseconds, since the dominant delay term is the one introduced by the time constant of the neuron itself (tens of milliseconds) or the axonal delay model (milliseconds). For this approach to be successful, it is important for the communication fabric to be over-engineered so that the network is never heavily loaded. This philosophy is sometimes described by stating that ‘time represents itself’ – that is, the arrival time of spikes itself represents the time at which the spikes should be delivered. This relies on real-time behavior of spiking networks and their silicon implementation.

2.2   Address-Event Representation

Typical neuron firing rates are in the regime of 1–10 Hz. Hence, thousands or even millions of neurons combined have spiking rates in the KHz to low MHz regime. This data rate can be easily supported by modern digital systems. Hence, instead of creating a network of individual neurons, neuromorphic electronic systems have end points that correspond to clusters of neurons, where a cluster can correspond to a specific processing layer. The circuits used to multiplex communication for a cluster of neurons into an individual communication channel are referred to as address event representation (AER) circuits. AER was first proposed in 1991 by Mead’s lab at Caltech (Lazzaro et al. 1993; Lazzaro and Wawrzynek 1995; Mahowald 1992, 1994; Sivilotti 1991), and has been used since then by a wide community of hardware engineers.

The function of AER circuits is to provide multiplexing/demultiplexing functionality for spikes that are generated by/delivered to an array of individual neurons. Figure 2.2 shows an example of how the spiking behavior of four neurons is encoded onto a single output channel.

Spikes are generated asynchronously, and the AER circuits accept spikes as they are generated and multiplex them onto a single output channel. The sequence of values produced on the output channel indicates which neuron fired. The time at which the neuron identifier is generated corresponds to the time at which the neuron produced a spike, plus a small delay due to the encoding process. As long as spikes are sufficiently separated in time, the encoding process ensures that the neuron identifiers are correctly ordered.

If each neuron in the cluster were guaranteed to only produce a spike when no other neuron in the same cluster was spiking, then the multiplexing circuits would correspond to a standard asynchronous encoder circuit. However, this is not a valid constraint as groups of neurons in a cluster could have similar/overlapping firing times. AER encoders therefore generally use arbitration logic to handle potentially simultaneous spike arrival times from multiple neurons.

The demultiplexing circuits are easier to design, and are illustrated in Figure 2.3. In the demultiplexing process, an input value received from an AER channel specifies the axon/dendrite (depending on your perspective) identifier to which the spike is to be delivered. The dendrite number is decoded using an asynchronous decoder, and the spike is delivered to the appropriate destination. These decoder circuits are identical to those found in self-timed memory structures. In the example shown in Figure 2.3, the address-event sequence is decoded into spikes that are delivered to individual dendrites that in turn are connected to the appropriate neurons. If the delays between adjacent dendrite identifiers in the AER input are sufficiently large, then the output of the AER decoder delivers a spike to the dendrite at the time that corresponds to the arrival time of the AER input plus a small delay corresponding to the decoder circuit. The combination of a self-timed AER encoder and decoder results in spikes being delivered from source neurons to destination neurons in a manner that preserves the inter-arrival delay among spikes.

images

Figure 2.2   Multiplexing different neurons onto a single communication channel

images

Figure 2.3   Demultiplexing an AER input into individual spikes for each dendrite

In the simplest possible AER scheme, neuron identifiers generated by the AER encoder directly correspond to the addresses of dendrites at the destination. For example, if we directly connect the output of Figure 2.2 to the input in Figure 2.3 as shown in the figure at the head of this chapter, that provides a direct connection between the axon and dendrite at the source and destination. This is useful if, for example, the source and destination are on separate chips, or if the encoding/decoding process makes the wiring between the axons and dendrites more manageable. In more complex schemes, the neuron identifiers are translated into the appropriate individual or set of destination identifiers for spike delivery.

2.2.1   AER Encoders

AER encoders have many individual axons as input, and they have to implement two functions. First, the encoder must determine which is the next spike to be communicated on the output channel. This corresponds to the traditional arbitration problem in asynchronous systems. Once the spiking axon has been selected, the circuit must encode the axon identifier and produce an output data value. This is a traditional encoder, where one of N different integers is encoded using log N bits. There is a variety of schemes in the literature for AER encoders. These schemes differ in the mechanisms they use for resolving conflicts between multiple simultaneous spikes, and how the neuron identifiers are encoded. Each scheme has its strengths and weaknesses, and is appropriate for different types of neuromorphic systems.

There are two common methods to organize AER encoders. In the simplest mechanism, the neurons are logically organized in a linear array, and a single encoder is used to collect the spikes into an output channel. This approach was used in the implementation of cochlear circuits (Lazzaro et al. 1993). When the number of neurons becomes very large, a common practice is to arrange the neurons in a 2D array and use two separate encoders to encode the x-address and y-address of the neuron that spiked. This is especially popular in retina models where neurons are naturally organized in a 2D spatial grid.

2.2.2   Arbitration Mechanisms

Arbitration refers to the process of selecting the order of access to a shared resource among asynchronous requests. In AER systems the shared resource is the bus that carries the address events. If we provide random access to a shared communication channel or bus, we have to deal with contention for channel access, which occurs when two or more elements (in our case neurons) attempt to transmit their addresses simultaneously. We can introduce an arbitration mechanism to resolve contention and a queuing mechanism to allow nodes to wait for their turn. The queues introduced can be at the source (per neuron), shared (per AER encoder), or a combination of the two.

Bus Sensing

The most straightforward mechanism to implement an arbitration scheme resembles the carrier sense multiple access (CSMA) protocols that are used in wireless networking as well as in the original Ethernet protocol (IEEE 802.3 working group 1985). This approach involves circuitry to detect if the AER bus is occupied; a spike is transmitted only if the bus is free. If the bus is busy, then the spike is discarded (Abusland et al. 1996). This approach has the advantage that spikes on the AER bus are transmitted at a predictable time relative to when the spike occurred. However, spikes may be lost due to bus contention.

Tree Arbitration

The most popular arbitration method is to use the classical arbiter circuit to determine the order in which active on-chip elements communicate their address. The addresses of spiking elements are effectively in a queue and are transmitted of-chip as bus occupancy permits (Boahen 2000). No events will be lost, but spike timing information will not be preserved as soon as multiple elements are queuing for access to the bus. Spikes that do not win the arbitration process are queued at the neuron. This approach was first proposed in (Mahowald 1992). If we imagine that the permission to access the shared AER output channel is visualized as a token (shown as a dot in Figure 2.4), then the process of requesting access to the output channel can be viewed as sending a request for the token to the root of a tree of two-way arbiter elements. The token then moves down the tree to one of the requesting neurons. This is illustrated in Figure 2.4.

images

Figure 2.4   Tree arbitration scheme for AER

In the standard scheme shown in Figure 2.4, each arbiter handles an input request by first sending a request for the token up the tree, and then responding to the input request after it has received the token. After the input has been handled, the token is returned to the root of the tree. When the number of neurons being handled becomes large, this introduces a penalty of O(log N) stages of arbitration per request. Note that this delay impacts the throughput of spike communication, since the shared AER bus cannot be used until permission to access it is obtained through the arbitration mechanism.

Greedy Tree Arbitration

An optimized version of the tree arbiter scheme uses the notion of a greedy arbiter (Boahen 2000). The greedy arbiter is a modified arbiter circuit that supports an optimized mechanism to handle simultaneous requests. If both inputs to an arbiter block request access to the token within a short interval of each other, then after one of the inputs is handled the token is propagated to the other input before returning it to the root of the tree. This can be viewed conceptually as providing a short-circuit path for token propagation at each level of the tree, and is illustrated in Figure 2.5.

If there is very little spike activity, then the greedy approach has similar performance to the tree arbitration approach. If there is significant spatially correlated spike activity, then the greedy path is activated and spikes can be serviced without having the token reach the root of the arbitration tree.

Ring Arbitration

A third approach is to construct a token-ring scheme for arbitration. In this scheme, illustrated in Figure 2.6, a ring of arbitration elements is constructed with the token staying at a fixed location until a request for the token travels around the ring until the token is found. It then moves to the requested location. This mechanism has benefits if the average distance traveled by the token is small, but has drawbacks if the token has to travel a long distance between spikes (Imam and Manohar 2011).

images

Figure 2.5   Token movement in basic arbiter scheme versus the greedy arbiter scheme

images

Figure 2.6   Token ring arbitration scheme

Multidimensional Arbitration

The arbitration mechanisms described so far are suitable for selecting one out of N neurons that need to access the AER bus. If neurons are organized in a images × images array (e.g., in a silicon retina), then each row of neurons would require O(images) wires (one per neuron in the row). This is not a scalable approach as N becomes large. Instead, an alternative is to adopt a 2D arbitration scheme.

2D arbitration schemes provide another opportunity for optimization. If we imagine the neurons organized in a 2D array, then there are two levels of arbitration necessary to enforce mutual exclusion among all the neurons: (i) row arbitration, which selects a row where a spike was produced; (ii) column arbitration, which selects one of the columns in the currently selected row where a spike occurred. The combination of row and column arbitration uniquely selects the neuron, and requires fewer wires than a 1D arbitration approach (Mahowald 1992).

An optimization for the 2D arbitration scheme that has also been used is called burst-mode operation. In burst-mode operation, row arbitration is followed by saving the entire state of the selected row into a set of latches. After this, all the spikes in the selected row are scanned out one at a time. This, combined with an efficient encoding scheme that shares the row address, can lead to an efficient AER scheme when bursts of spikes in an individual row are expected (Boahen 2004c).

We return to the issue of arbitration in Section 12.2.2.

Eliminating Arbitration

Arbitration lengthens the communication cycle period, reducing the channel capacity, and queuing causes temporal dispersion with the loss of timing information. An alternative is to provide no arbitration. Mortara et al. (1995) allowed any active element to immediately place its address on a common bus. The elements must be assigned addresses in such a way that not all numerically possible addresses are valid and such that when two or more elements place their addresses on the bus simultaneously (this is referred to as a collision) this results in an invalid address. These must then be ignored by the receiver. Allowing collisions to occur, and discarding the invalid addresses so generated, has the advantage of achieving a shorter cycle period and reducing dispersion (spike timing is preserved when no collision occurs), but events will be lost and this loss will increase as the load increases.

2.2.3   Encoding Mechanisms

Once the appropriate neuron has been selected, the neuron address has to be encoded into a compact representation for the AER output channel. The standard mechanism to do this uses the fact that the grant lines for each neuron are a one-hot encoding of the neuron address. Therefore, a conventional logarithmic encoder can be designed where the grant lines are encoded into logN wires to constitute the neuron address (Mahowald 1992; Sivilotti 1991). This is illustrated in Figure 2.7.

Each grant line connects to O(logN) transistors, so the structure uses O(NlogN) transistors to compute the encoded output value. An alternative is to use the fact that the choice made by arbiters at each level of the tree corresponds to selecting the value of each bit in the encoded output. For example, the leaves of the arbiter tree determine if the least significant bit of the encoded output is 0 or 1. Hence, if we encode each bit of the output at each level of the tree, then the encoded output can be constructed with fewer transistors. Each arbiter connects to two different transistors only, making the total transistor count for this variant O(N) (Georgiou and Andreou 2006). This method is illustrated in Figure 2.8.

images

Figure 2.7   Basic encoder structure

images

Figure 2.8   Hierarchical encoder structure

A third approach maintains a counter that always tracks the current location of the token. Every time the token moves, the value of the counter is updated to reflect the current token position. For a linear arbitration structure, this corresponds to incrementing or decrementing a simple counter. For a tree-based arbitration structure, each bit in the counter is updated by a different level of the tree. This method is illustrated in Figure 2.9.

images

Figure 2.9   Counter-based encoder structure

2.2.4   Multiple AER Endpoints

So far we have described how spikes from a set of neurons can be multiplexed onto a shared AER channel, and how spikes encoded in this manner can be delivered to another set of neurons. There are two additional components necessary to complete an AER communication system: (i) a mapping from source neuron address to destination neuron/axon address; (ii) a routing architecture for multiple clusters of neurons.

2.2.5   Address Mapping

The AER encoder produces a sequence of spikes that are encoded via the source neuron address. The output of a particular source neuron n is connected to a specific destination axon, identified by its axon address a. Therefore, spikes with neuron identifier n must be remapped to address a so that they are delivered to the appropriate destination axon.

Sometimes this mapping function is quite simple – for example, if a chip with an array of neurons creates spikes that are delivered to corresponding neurons in another chip, then the mapping function would be the identity function and no translation/mapping is necessary.

In more general implementations, spikes from neuron n might have to be delivered to an arbitrary destination axon a. In such situations where the connectivity is programmed via software rather than hardcoded, a mapping table that implements the appropriate address translation is required. Programmable neuromorphic systems have implemented such tables using both on-chip (Lin et al. 2006) and off-chip memories.

Probably the most challenging aspect of implementing communication in neuromorphic systems is the fact that a spike created by a neuron is typically delivered to a very large number of destination neurons. Mechanisms that support high spike fan-out are discussed through case studies in Chapter 16.

2.2.6   Routing

A general neuromorphic system is organized with clusters of neurons, with each cluster typically corresponding to a set of neurons that are physically proximate. Since the entire system consists of many such clusters, a mechanism is needed to route spikes between clusters.

Many different routing topologies are possible, and here we provide a brief overview of some of the options available with a more in-depth description of architectures used for large-scale neuromorphic systems in Chapter 16. A detailed discussion of different routing architectures can be found in a number of text books (e.g., Dally and Towles 2004).

Point-to-Point Communication

The simplest topology for communication is a point-to-point communication architecture. In such a system, spikes from one cluster of neurons are simply transmitted to corresponding neurons in another cluster. This permits chaining of spike communication from one chip to the next, with each cluster of neurons performing additional processing before propagating spikes.

Rings and 1D Arrays

Rings and 1D arrays are linear structures with nearest-neighbor communication. Two commonly used addressing mechanisms include distance-based addressing and chip identifier-based addressing. In distance-based addressing, a spike from one chip is routed to a chip a certain distance away (e.g., routed to a chip three hops to the right). Each hop decrements the distance count, and a spike is accepted when the hop count is zero and delivered to the local cluster. In chip identifier-based addressing, an incoming spike identifier is matched against a local identifier (or, more generally, a table of potential identifiers) and accepted if the spike identifier matches the local chip information.

Meshes

Meshes extend the 1D topology to 2D. Routing is typically performed using dimension-ordered routing (sometimes called ‘XY’ routing or ‘X first’ routing). The standard approach to mesh routing is to specify a spike to be delivered to a (Δx, Δy) offset relative to the current cluster, with one dimension being routed before the other.

Trees

Tree routing can be used as an alternative to deliver spikes. The route can be specified as a sequence of turns in the tree. In the simplest case, a packet traveling along an edge of the binary tree always has two options for the next hop, and a sequence of bits can be used to identify the complete path from source cluster to destination cluster.

2.3   Considerations for AER Link Design1

The bandwidth of the communication links between the communicating processes or blocks is a critical specification. Initial analysis of performance figures for these links was explored by Mortara and Vittoz (1994) and subsequently extended by Boahen (2000).

The performance of event-driven communication links can be measured by five criteria: capacity, throughput, latency, integrity, and dispersion (see Table 2.1). Capacity is defined as the reciprocal of the minimum transmission time; this is the maximum rate at which events can be transmitted and received on the link. Throughput is defined as the usable fraction of capacity; the maximum rate is rarely sustainable in practice. Latency is defined as the mean delay; this wait time may be several transmission slots. Latency depends on the fraction of transmission slots that are filled. The fraction of the link capacity that is actually being used is called the load. Integrity is the fraction of spikes that are correctly delivered to the destination. Integrity is used to model the notion of networks that are allowed to drop spikes. Dispersion is defined as the standard deviation of the latency distribution. This metric determines how well spike timing properties are preserved.

Link designers strive not only to maximize throughput, but to minimize latency as well. High throughput allows large numbers of event generators operating over a broad range of rates to be serviced, while low latency preserves the timing of each individual event. Throughput is optimized if collisions are prevented through arbitration; it is then limited only by the increase in latency with activity due to queuing (Boahen 2000; Deiss et al. 1999; Sivilotti 1991). To achieve a specified timing error, defined as the percentage error in a cell’s inter-event interval, throughput must be capped at a level below the maximum link, or channel, capacity.

Table 2.1   Time-multiplexed communication channel design options. © 2000 IEEE. Reprinted, with permission, from Boahen (2000)

Feature Approaches Remarks
Latency Polling
Event driven
∝ Number of neurons
∝ Active fraction
Integrity Rejection
Arbitration
Collisions increase exponentially
Queue events
Dispersion Dumping
Queuing
No waiting
−1 surplus capacity
Capacity Hardwired
Pipelined
Simple ⇒ Short cycle time
−1 slowest stage

Several papers by Mortara and by Boahen analyze the trade-off between latency and throughput in these asynchronous parallel, read–write links and in some cases, these analytical results are validated using measurements from fabricated chips. Some of the original numbers have improved with each generation of chips through a combination of clear circuit design techniques, the use of modern faster CMOS processes, and the introduction of new communication protocols that speed up the transmission.

Given an information coding strategy, the communication channel designer faces several trade-offs. Should they preallocate the channel capacity, giving a fixed amount to each user, or allocate capacity dynamically, matching each user’s allocation to his or her current needs? Should they allow users to transmit at will, or implement elaborate mechanisms to regulate access to the channel? And how does the distribution of activity over time and over space impact these choices? Can they assume that users act randomly, or are there significant correlations between their activities?

2.3.1   Trade-off: Dynamic or Static Allocation

Consider a scenario where a neuron is adaptive – namely, the neurons sample at fNyq when the signal is changing, and sample at fNyq/Z when the signal is static, where Z is a prespecified attenuation factor. Let the probability that a given neuron samples at fNyq be a: That is, a is the active fraction of the population. Then, each quantizer generates bits at the rate

images

because for the fraction a of the time, it samples at fNyq; the remaining fraction (1 − a) of the time, it samples at fNyq/Z. Furthermore, log2N bits are used to encode the neuron’s location for AER, where N is the number of neurons.

On the other hand, we may use conventional quantizers that sample every location at fNyq and do not locally adapt their sampling rate. In that case, there is no need to encode location explicitly. We simply poll all N locations, according to a fixed sequence, and infer the origin of each sample from its temporal location. This is similar to scanning all the neurons. As the sampling rate is constant, the bit-rate per quantizer is simply fNyq. The multiple bits required to encode identity are offset by the reduced sampling rates produced by local adaptation when activity is sparse. In fact, adaptive sampling produces a lower bit rate than fixed sampling if

images

For example, in a 64 × 64 array of neurons with sampling rate attenuation Z = 40, the active fraction, a, must be less than 6.1%.

It may be more important to minimize the number of samples produced per second instead of minimizing the bit rate as there are usually sufficient I/O pins to transmit all the address bits in parallel. In that case, it is the number of samples per second that is fixed by the channel capacity. Given a certain fixed throughput Fch, in samples per second, we may compare the effective sampling rates, fNyq, achieved by various sampling strategies. Adaptive neurons allocate channel throughput dynamically in the ratio a: (1 − a) = Z between active and passive fractions of the population. Hence

images

where fchFch/N is the throughput per neuron. The average neuronal ensemble size determines the active fraction, a, and frequency adaptation and synchronicity determine the attenuation factor, Z, assuming neurons that are not part of the ensemble have adapted. Figure 2.10 shows how the sampling rate changes with the active fraction for various frequency adaptation factors, Z = γ. For small a and Z > 1/a, the sampling rate may be increased by a factor of at least 1/2a.

images

Figure 2.10   Effective Nyquist sampling rate versus active fraction plotted for various frequency adaptation factors (γ), with throughput fixed at 10 spikes/s/neuron. As the active fraction increases, the channel capacity must be shared by a larger number of neurons, and hence, the sampling rate decreases. It falls precipitously when the active fraction equals the reciprocal of the adaptation factor. © 2000 IEEE. Reprinted, with permission, from Boahen (2000)

2.3.2   Trade-off: Arbitered Access or Collisions?

Contention occurs if two or more neurons attempt to transmit simultaneously when we provide random access to the shared communication channel. We can simply detect and discard samples corrupted by collision (Mortara et al. 1995). Or we can introduce an arbiter to resolve contention and a queue to hold waiting neurons. Unfettered access shortens the cycle time, but collisions increase rapidly as the load increases, whereas arbitration lengthens the cycle time, reducing the channel capacity, and queuing causes temporal dispersion, degrading timing information.

Assuming the spiking neurons are described by independent, identically distributed, Poisson point processes, the probability of k spikes being generated during a single communication cycle is given by

images

where G is the expected number of spikes. G = Tch/Tspk, where Tch is the cycle time and Tspk is the mean interval between spikes. By substituting 1/Fch for Tch, where Fch is the channel capacity, and 1/(Nfν) for Tspk, where fν is the mean spike rate per neuron and N is the number of neurons, we find that G = Nfν = Fch. Hence, G is equal to the offered load.

We may derive an expression for the collision probability, a well-known result from communications theory, using the probability distribution P(k, G). To transmit a spike without a collision, the previous spike must occur at least Tch seconds earlier, and the next spike must occur at least Tch seconds later. Hence, spikes are forbidden in a 2Tch time interval, centered around the time that transmission starts. Therefore, the probability of the spike making it through is P(0, 2G) = e−2G, and the probability of a collision is

images

The unfettered channel must operate at high error rates to maximize channel utilization. The throughput is S = Ge−2G, since the probability of a successful transmission (i.e., no collision) is e−2G. Throughput may be expressed in terms of the collision probability

images

This expression is plotted in Figure 2.11. The collision probability exceeds 0.1 when throughput reaches 5.3%. Indeed, the unfettered channel utilizes a maximum of only 18% of its capacity. Therefore, it offers higher transmission rates than the arbitered channel only if it is more than five times faster since, as we shall show next, the arbitered channel continues to operate in a useful regime at 95% capacity.

images

Figure 2.11   Throughput versus collision probability throughput attains a maximum value of 18% when the collision probability is 0.64, and the load is 50%. Increasing the load beyond this level lowers throughput because collisions increase more rapidly than the load does. © 2000 IEEE. Reprinted, with permission, from Boahen (2000)

2.3.3   Trade-off: Queueing versus Dropping Spikes

What about the timing errors introduced by queuing in the arbitered channel? For an offered load of 95%, the collision probability is 0.85. Hence, collisions occur frequently and neurons are most likely to spend some time in the queue. By expressing these timing errors as percentages of the neuronal latency and temporal dispersion, we can quantify the trade-off between queuing new spikes, to avoid losing old spikes, versus dumping old spikes, to preserve the timing of new spikes.

To find the latency and temporal dispersion introduced by the queue, we use well-known results from queuing theory, which give moments of the waiting time, images; as a function of moments of the service time, images:

images

where λ is the arrival rate of spikes. These results hold when spikes arrive according to a Poisson process. With x = Tch and λ = G/Tch, the mean and the variance of the cycles spent waiting are given by

images

We have assumed that the service time, x, always equals Tch; and therefore images.

We find that at 95% capacity, for example, a sample spends 9.5 cycles in the queue, on average. This result agrees with intuition: As every twentieth slot is empty, one must wait anywhere from 0 to 19 cycles to be serviced, which averages out to 9.5. Hence the latency is 10.5 cycles, including the additional cycle required for service. The standard deviation is 9.8 cycles – virtually equal to the latency. In general, this is the case whenever the latency is much more than one cycle, resulting in a Poisson-like distribution for the wait times. We can express the cycle time, Tch, in terms of the neuronal latency, μ, by assuming that Tch is short enough to transmit half the spikes in an ensemble in that time. That is, if the ensemble has Nimages spikes and its latency is μ, the cycle time must satisfy μ/Tch = (Nimages/2)(1/G) since 1/G cycles are used to transmit each spike, on average, and half of them must be transmitted in μ seconds. Using this relationship, we can express the wait time as a fraction of the neuronal latency:

images

The timing error is inversely proportional to the number of neurons because the channel capacity grows with population size. Therefore, the cycle time decreases, and there is a proportionate decrease in queuing time – even when the number of cycles spent queuing remains the same.

Conversely, given a timing error specification, we can invert our result to find out how heavily we can load the channel. The throughput, S, will be equal to the offered load, G, since every spike is transmitted eventually. Hence, the throughput is related to channel latency and population size by

images

when the channel capacity grows linearly with the number of neurons. Figure 2.12 shows how the throughput changes with the channel latency. It approaches 100% for large timing errors and drops precipitously for low timing errors, going below 95% when the normalized error becomes less than 20/Nimages. As Nimages = aN, the error is 400/N if the active fraction, a, is 5%. Therefore, the arbitered channel can operate close to capacity with timing errors of a few % when the population size exceeds several tens of thousands.

2.3.4   Predicting Throughput Requirements

Given a neuron’s firing rate immediately after a step change in its input, fa, we can calculate the peak spike rate of active neurons and add the firing rate of passive neurons to obtain the maximum spike rate. Active neurons fire at a peak rate of imagesfa, where images is the synchronicity. And passive neurons fire at fa/γ (assuming they have adapted), where γ is the frequency adaptation. Hence, we have

images

where N is the total number of neurons and a is the active fraction of the population, which form a neuronal ensemble. We can express the maximum spike rate in terms of the neuronal latency by assuming that spikes from the ensemble arrive at the peak rate. In this case, all aN neurons will spike in the time interval 1/(imagesfa). Hence, the minimum latency is μmin = 1/(2imagesfa). Thus, we can rewrite our expression for Fmax as

images

Figure 2.12   Throughput versus normalized channel latency plotted for different neuronal ensemble sizes (Nimages). Higher throughput is achieved at the expense of latency because queue occupancy goes up as the load increases. These wait cycles become a smaller fraction of the neuronal latency as the population size increases, because cycle time decreases proportionately. © 2000 IEEE. Reprinted, with permission, from Boahen (2000)

images

Intuitively, μmin is the neurons’ timing precision and N(a + (1 − a)/(imagesγ))/2 is the number of neurons that fire during this time. The throughput must be equal to Fmax, and there must be some surplus capacity to minimize collision rates in the unfettered channel and minimize queuing time in the arbitered one. This overhead is over 455% (i.e., (1 − 0.18)/0.18) for the unfettered channel, but only 5.3% (i.e., (1 − 0.95)/0.95) for the arbitered one.

In summary, arbitration is the best choice for neuromorphic systems whose activity is sparse in space and in time, because we trade an exponential increase in collisions for a linear increase in temporal dispersion. Furthermore, holding utilization constant (i.e., throughput expressed as a percentage of the channel capacity), temporal dispersion decreases as technology advances and we build larger networks with shorter cycle times, even though the collision probability remains the same. The downside of arbitration is that it takes up area and time, reducing the number of neurons that can be integrated onto a chip and the maximum rate at which they can fire. Several effective strategies for reducing the overhead imposed by arbitration have been developed; they are the subject of the next section.

2.3.5   Design Trade-offs

For a random-access, time-multiplexed channel, the multiple bits required to encode identity are offset by the reduced sampling rates produced by local adaptation when activity is sparse. The payoff is even better when there are sufficient I/O pins to transmit all the address bits in parallel. In this case, frequency and time-constant adaptation allocate bandwidth dynamically in the ratio a: (1 − a)/Z between active and passive fractions of the population. For low values of the active fraction, a, and sampling-rate attenuation factors, Z, larger than 1/a, the effective Nyquist sampling rate may be increased by a factor of 1/2a.

Contention occurs when two or more neurons spike simultaneously, and we must dump old spikes to preserve the timing of new spikes or queue new spikes to avoid losing old spikes. An unfettered design, which discards spikes clobbered by collisions, offers higher throughput if high spike loss rates are tolerable. In contrast, an arbitered design, which makes neurons wait their turn, offers higher throughput when low spike loss rates are desired. Indeed, the unfettered channel utilizes only 18% of its capacity, at the most. Therefore, the arbitered design offers more throughput if its cycle time is no more than five times longer than that of the unfettered channel.

The inefficiency of the unfettered channel design, also known as ALOHA, has been long recognized, and more efficient protocols have been developed (Schwartz 1987). One popular approach is CSMA (carrier sense, multiple access), where each user monitors the channel and does not transmit if it is busy. This channel is prone to collisions only during the time it takes to update its state. Hence, the collision rate drops if the round trip delay is much shorter than the packet-transmission time, as in bit-serial transmission of several bytes. Its performance is no better than ALOHA’s, however, if the round trip delay is comparable to the packet-transmission time (Schwartz 1987), as in bit-parallel transmission of one or two bytes. Consequently, it is unlikely that CSMA will prove useful for neuromorphic systems (some preliminary results were reported in Abusland et al. 1996).

As technology improves and we build denser arrays with shorter cycle times, the unfettered channel’s collision probability remains unchanged for the same normalized load, whereas the arbitered channel’s normalized timing error decreases. This desirable scaling arises because timing error is the product of the number of wait cycles and the cycle time. Consequently, queuing time decreases due to the shorter cycle times, even though the number of cycles spent waiting remains the same. Indeed, as the cycle time must be inversely proportional to the number of neurons, N, the normalized timing error is less than 400/N for loads below 95% capacity and active fractions above 5%. For population sizes of several tens of thousands, the timing errors make up just a few percentage points.

For neurons whose timing precision is much better than their inter-spike interval, we may estimate throughput requirements by measuring frequency adaptation and synchronicity. Frequency adaptation, γ, gives the spike rate for neurons that are not part of the neuronal ensemble. And synchronicity, images, gives the peak spike rate for neurons in the ensemble. These firing rates are obtained from the spike frequency at stimulus onset by dividing by γ and multiplying by images, respectively. The throughput must exceed the sum of these two rates if we wish to transmit the activity of the ensemble without adding latency or temporal dispersion. The surplus capacity must be at least 455% to account for collisions in the unfettered channel, but may be as low as 5.3% in the arbitered channel, with subpercent timing errors due to queuing.

images

Figure 2.13   The system model, with sender S, receiver C, control signals request (R) and acknowledge (A), and implementation-dependent data wires

2.4   The Evolution of AER Links

Different neuromorphic chips or modules that communicate with each other using the AER representation must not only agree on the logical description of the protocol to communicate spikes, but also the physical and electrical protocol for communication. There are a number of standard approaches that have been used for this purpose, and we describe the evolution of various AER link implementations.

2.4.1   Single Sender, Single Receiver2

One of the earliest standards to emerge for point-to-point AER communication, that is, from a single sender to a single receiver, is dubbed ‘AER 0.02’ after the subtitle of the technical report in which it was first described (AER, 1993). This document described the logical and electrical requirements for signals used in AER communication.

AER 0.02 only concerns the point-to-point, unidirectional communication of asynchronous data from a sender S to a receiver C, as shown in Figure 2.13. The connection between S and C consists of two types of wires, control wires and data wires.

The data wires are exclusively driven by S, and exclusively sensed by C. The encoding used by the data wires is not the subject of AER 0.02; the number of wires, the number of states on each wire, and the number representation of the data wires are all considered implementation dependent. Only the validity of the data wires is specified by AER 0.02. If S is driving a stable high signal on the data wires, suitable for robust sensing by C, the data lines are considered valid; if this condition does not hold, the data lines are considered invalid. For those familiar with asynchronous communication protocols, AER 0.02 uses a bundled-data protocol for communication with a four-phase handshake on the wires R and A.

The control wires use a delay-insensitive protocol for communication. Figure 2.14 shows the control sequence, a four-phase handshake, that communicates data from S to C. Initially both R and A are logic zero, and the data wires are considered invalid. To begin a transaction, S first drives valid signals on the data wires, and then drives R to logic 1. The receiver detects that R is a logic one, and can now sample the data wires. The bundling timing constraint is that the data wires are stable at the receiver when the receiver detects that R is a logic one. Once C has received the data, it responds by setting the acknowledge wire A to a logic one. As soon as the sender S detects this, the data wires no longer need to remain valid. The protocol concludes with S setting R to zero, in response to which C also resets the acknowledge wire A to zero and the entire transaction can repeat.

images

Figure 2.14   Single sender, single receiver, four-phase handshake. R is an active-high request signal; A is an active-high acknowledge signal

The minimum timing requirements in AER 0.02 are specified with an empirical test. A sender compatible with AER 0.02 must supply a free-running stream of signals on the data wires, if the R and A control wires of the sender are connected together. In addition, the signals on the data wires must be valid whenever R is at logic level 1. A receiver compatible with AER 0.02 must sense a stream of valid addresses, if an inverted version of the A signal of the receiver is connected to the R of the receiver, and a valid address is present on the data wires. It was expected that future versions of the standard might augment these empirical requirements with more traditional specifications.

If an implementation used voltages as the signal variable, AER 0.02 strongly suggested that, in the default mode of operation for an implementation, the logic 1 level for A and R is more positive than the logic 0 level. In addition, AER 0.02 strongly suggests implementations support multiple modes of signal polarity; ideally, the electrical polarity of A and R can be changed, either by physical contact (reconfiguring signals or jumpers) or under programmed control. AER 0.02 was a minimal communications standard, and it was expected that research groups would augment the standard to add new functionality, and it was expected that future versions of the standard would largely be the codification of enhancements that had been successfully implemented.

Guidelines for extensions to AER 0.02 were also prescribed in that it was stated that in an extended AER 0.02 implementation, it must be possible to nondestructively disable all extensions, for example, through reconfiguring switches or jumpers, or via programmed control. In this ‘backward-compatibility’ mode, an implementation must supply A and R control signals and data wires that perform as required by AER 0.02.

In practice, the AER 0.02 standard was so loosely defined that it was of limited utility. The basic four-phase handshake with request and acknowledge lines became established. But AER 0.02 did not specify voltages, bus width, signal polarities, traditional signal setup and hold times, or any kind of connector standard. Indeed AER 0.02 implementations with nontraditional electrical signaling methods were encouraged! This all meant that any two devices, both of which conformed to AER 0.02, were likely to be incompatible, particularly if they were developed in different labs. In order for two such devices to work together, there would normally need to be some glue logic device built to connect them, and this would introduce new possible points of failure due to anything from timing violations to bad solder joints. Building general purpose interfaces that satisfied the AER 0.02 ideal of supporting either electrical polarity for A and R introduced further complexity to the design and configuration.

Although never codified as AER 0.03 (or some higher revision number), more tightly defined de facto standards did emerge. Voltages were defined to be 0 V/5 V TTL, later 3.3 V, the bus width was 16 bits, data lines were positive logic; request and acknowledge lines negative logic, and insulation displacement connectors of various widths became standard (cf. Dante 2004; Deiss et al. 1999; Häfliger 2003). This made it easier to construct multichip systems more reliably, for example, the system described in Serrano-Gotarredona et al. (2009), which we discuss in Section 13.3.3.

Timing issues often remained because the senders violated the requirement that the signals on the data wires must be valid whenever the request signal is at logic level 1. The request signal was often driven by the same circuitry that drove the data wires and changed state at the same time as the data wires. In such a system there is no guarantee that the data seen by a receiver will be valid when the receiver sees the request signal.

2.4.2   Multiple Senders, Multiple Receivers

A possible idealized design for an address-event system consists of multiple address-event sending and receiving blocks that place addresses on an address-event bus to send, and simply monitor the bus to listen for addresses which they are interested in to receive those addresses when they occur on the bus.

This design uses source addresses on its bus. As a fan-out factor of 10 to 10,000 might be expected in neural networks, that is, one source address might be of interest to up to O(104) destinations, using source addresses helps to keep down the bandwidth requirements. The idealized address-event receiver blocks are expected to perform the required fan-out and conversion to destination addresses internally.

A design with multiple senders and receivers (not all senders need also be receivers and vice versa) on a single bus needs a somewhat different protocol to that used in the point-to-point case described in Section 2.4.1 above. To avoid collisions on the shared data lines, a sender may not drive them at the same time as generating a request R but must wait until it has received the acknowledge signal A. This is illustrated in Figure 2.15.

For the SCX project (Deiss et al. 1999) a multiple sender, multiple receiver protocol was defined (illustrated in Figure 2.16) in which each device i connected to its AE bus has its own dedicated pair of request (REQi) and acknowledge (ACKi) lines. In addition, a collection of shared data lines are used. A device that wishes to transmit a spike on the bus asserts its request line (de-asserts REQi). A centralized bus arbiter monitors all the request lines, and selects one of the devices requesting bus access by asserting the corresponding acknowledge line (de-asserting ACKi). When a requesting device detects that it has been selected it drives data on the shared bus, after which it de-asserts its request line (i.e., asserts REQi). At this point all devices can listen on the bus and determine if the spike on the bus should be received locally. To simplify this process, the bus arbiter generates a global READ signal. A certain amount of time is budgeted for receiving devices to accept the data off the shared wires, and after this time has elapsed the ACKi is returned to its initial state thereby permitting the next bus transaction.

images

Figure 2.15   Multiple sender, multiple receiver four-phase handshake

images

Figure 2.16   SCX project multiple sender, multiple receiver protocol. Adapted from Deiss (1994). Reproduced with permission of Stephen R. Deiss, ANdt (Applied Neurodynamics)

Passive listener devices may exist on the bus that do not generate any request signals. These devices monitor the AE activity on the bus by latching the address bits on either edge of the of the READ signal. Signals called WAIT and TIMEOUT were specified to implement a mechanism whereby a slow acting device may delay the bus arbiter from continuing with a further bus cycle (by granting an acknowledge to some other device) for up to 1 μs. To cause such a delay, a device might assert WAIT (low) after the low going edge of READ, and de-assert it either when sufficient delay for its purposes has elapsed, or in any event when a low going edge occurs on TIMEOUT. The TIMEOUT signal would be driven low by the bus arbiter if WAIT were still low 1 μs after being first asserted. The use of this WAIT and TIMEOUT feature was however discouraged, and no device ever made use of it.

2.4.3   Parallel Signal Protocol

The Single Sender, Single Receiver (Mahowald 1992; Sivilotti 1991) and Multiple Sender, Multiple Receiver (Deiss et al. 1999) protocols described above employ (or at least imply) straightforward parallel buses. The SCX project adopted this approach, defining an optional connector standard for peripheral AE devices based on 26-way ribbon cable and IDC connectors (Baker and Whatley 1997). The first 16 pins carry the signals AE0 to AE15 in order, then come two ground pins followed by the request signal on pin 19 and the acknowledge signal on pin 20. This arrangement of the signals on the first 20 pins has been followed by several projects since then, including those using only the single sender, single receiver model, although the remaining 6 pins have been dropped so that only 20-pin cables and connectors are required.

The CAVIAR project decided to use an easy ‘off-the-shelf’ solution for their cable and connector that was widely used for what at that time were considered fast digital signals, namely the ATA/133 standard normally used at that time to connect storage devices to mother-boards inside PCs (Häfliger 2003). This standard uses a special 40-pin IDC connector with an 80-way ribbon cable. The connector remains compatible with older 40-pin ATA headers, but has internal wiring that routes the signals onto the 80-way ribbon cable in such a way that the signal wires are well interspersed with ground wires. Using this system provided a cheap, readily available interconnection solution with good electrical characteristics, but the pinout of the connectors is not compatible with the simple 20-pin layout described above. The signals are arranged for compatibility with the ATA/133 standard with AE0 to AE7 on pins 17 to 3 and AE8 to AE15 on pins 4 to 18; the request signal appears on pin 21 and the acknowledge on pin 29.

CAVIAR also defined the use of variants of the single-sender/single-receiver and single-sender/multiple-receiver protocols described above, which used 3.3 V signals and in which the various transition-to-transition bus timings were specified in the traditional manner, albeit with some of these timings being specified with a minimum of 0 and a maximum of infinity.

2.4.4   Word-Serial Addressing

Many address-event senders and receivers, particularly senders like so-called retina chips, are constructed as 2D arrays of address-event sender elements, for example, pixels. In this case, the address space used by the device is usually organized such that a certain number of address bits are used to indicate the row and a certain number of address bits are used to indicate the column in which an AE is generated. (For example a 128 × 128 pixel square array might produce AEs with addresses of the form y6y5y4y3y2y1y0x6x5x4x3x2x1x0, where each yi and xi represent a binary digit.) For every doubling in the size of the array, a further address bit must be generated and transmitted. This may be highly disadvantageous for a chip design: as more wires must be routed across the chip, there may remain less area to do useful computation (the fill factor becomes worse); also more pads, which are usually a scarce resource, are required to communicate to the outside world. It may also become highly inconvenient when the associated AEs enter the realm of conventional digital electronics and software where it is expected that data words have widths of a power of two, for example, an 18-bit AE does not fit conveniently into a 16-bit word.

One way to alleviate this problem would be to always transmit one AE over the course of two separate parts of a bus cycle; in our example above, first the y (row) address and then the x (column) address could be transmitted over a 7-bit bus. This would, however, be to the detriment of the time required to transmit the AE and the utilization of the available bus bandwidth.

As devices become larger, and also as the frequency with which the individual elements (pixels) wish to send events gets higher, the more likely it becomes that multiple elements on one row are waiting to send an event on the bus simultaneously. This can be exploited by servicing all waiting elements on one row transmitting the y (row) address just once followed by the x (column) addresses of all of the waiting elements in that row before moving on to another row. Such a group of one y address and one or more x addresses forms a burst in a similar way that bursts are used on memory buses. In a memory bus burst there is typically one address value followed by multiple data values from consecutive memory locations; in an AE bus burst, however, all words transmitted are partial addresses, and not necessarily from adjacent locations, merely from the same row. One extra signaling wire is required to distinguish the sending of an x address from the sending of a y address or equivalently to signal the beginning and end of a burst. This scheme, known as word-serial, achieves a more efficient utilization of the available bus bandwidth (Boahen 2004a, 2004b, 2004c).

2.4.5   Serial Differential Signaling3

With the speeds that AER chips and systems have recently reached, the parallel (including so-called Word-Serial) AER approach in board-to-board communication has become a limiting factor at the system level. With the frequencies on parallel AER in the order of tens of megahertz, the wavelength of those frequencies has shrunk to about the order of magnitude of the lengths involved in the experimental setups, or only slightly larger. One rule of thumb in electrical engineering says that if the signal wavelength is not at least one to two orders of magnitude greater than the physical size of the system, then the radio frequency (RF) properties of the signals have to be taken into account: wires can no longer be assumed to be perfect conductors with the same potential at every point, but have to be treated as transmission lines. If these problems are not taken into account, issues such as RF sensitivity, cross talk and ground bounce arise, especially in parallel AER links using ribbon cables. These issues can best be solved by resorting to serial differential signaling.

These issues with the parallel approach have also played a major role in industrial and consumer electronics in general. The solution has been to use even faster, but differential links, and to carefully control the line impedance at every point between the sender and receiver. In such a differential signaling scheme there is always a pair of wires that carry signals of opposite sense. The absolute value of the voltages on the signal wires does not have any meaning, only the voltage difference between the two wires of the pair has. These so called differential pairs are then usually shielded, thus avoiding the problems of RF sensitivity and cross talk to other signal wires. Because of the differential signaling, the ground bounce problem is also solved. A differential driver always pushes as much charge into one wire as it pulls from the other. Thus the net charge flow is always zero.

Parallel to Serial and Serial to Parallel

The data rates that can be achieved using differential signaling are orders of magnitude higher than with traditional single-ended signaling. Therefore less (but better) wires are nowadays used to achieve the same or better bandwidth than with the many parallel wires in traditional bus links.

For example IDE / parallel ATA can achieve up to 1 Gbps using 16 single-ended data signals, but only in one direction at a time (half-duplex). Serial ATA (SATA) has 2 differential pairs (and thus four signal wires), one pair to send, and one to receive (SATA n.d.). Each pair can transmit up to 3 Gbps.

An AER communication infrastructure using serial differential signaling over SATA cables for inter-board communication was implemented by Fasnacht et al. (2008) following an approach similar to the one proposed in (Berge and Häfliger 2007).

In order to take advantage of the low latency and high bandwidth available by using such high-speed serial links, the notion of using a handshake must be abandoned, as there is no time for an acknowledge signal to return to a sender before the next AE can be sent. It is, however, possible and advantageous to implement a back channel for a flow control signal; see Section 13.2.5 and Fasnacht et al. (2008) for details.

2.5   Discussion

We have seen that communication in neuromorphic systems is typically implemented with a form of time-multiplexed, packet-switched communication known as AER, in which time represents itself. This communication relies on the observations that neurons operate orders of magnitude slower than digital electronics, and that the small amounts of delay and jitter imposed by passing spikes through digital electronics are negligible in terms of the time constants in neurons.

AER communication requires multiplexing and encoder circuits at the sender, almost invariably some form of arbitration, and demultiplexing or decoding circuits at the receiver. We return to look at the design of these on-chip circuits in Chapter 12. As soon as there are multiple endpoints possible in a system, the issues of address mapping and routing arise. These issues are addressed further in Chapters 13 and 16.

We have looked at how in designing an AER link, there are five performance criteria to consider: capacity, throughput, latency, integrity, and dispersion. We have seen how these criteria are interrelated, and that there are trade-offs to be made between various features of a link. At the end of the Introduction to this chapter (Section 2.1) and the beginning of Section 2.2 we made the observation that the relevant timescales for the operation of neurons are very slow in comparison to the speed of the electronic circuits used to implement communication links. Formalizing these observations leads to the conclusion that arbitered channels are much more efficient for AER communication than unfettered (nonarbitered) ones.

We have also seen how the electrical and physical characteristics of communication links must be defined and preferably standardized in order to achieve practical communication between AER-based devices.

Parallel AER generally uses a form of four-phase handshake. With the move to faster serial differential signaling, handshaking must be abandoned, since there is no time for an acknowledge signal to be returned to the sender.

Over time there has been an increasing trend to use off-the-shelf interconnection technology, from the original simple use of IDC (Insulation Displacement Connectors), through the use of the then readily available ATA/133 cables, to the use of SATA cables in newer serial links.

The hardware infrastructure for implementing AER communication between AER chips will be examined further in Chapter 13. Before then, the intervening chapters discuss the individual building blocks of neuromorphic systems.

References

Abusland AA, Lande TS, and Høvin M. 1996. A VLSI communication architecture for stochastically pulse-encoded analog signals. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) III, pp. 401–404.

AER. 1993. The address-event representation communcation protocol [sic], AER 0.02.

Baker B and Whatley AM. 1997. Silicon cortex daughter board 1, http://www.ini.uzh.ch/amw/scx/daughter.htmlSCXDB1 (accessed July 28, 2014)

Berge HKO and Häfliger P. 2007. High-speed serial AER on FPGA. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), pp. 857–860.

Binzegger T, Douglas RJ, and Martin KAC. 2004. A quantitative map of the circuit of cat primary visual cortex. J. Neurosci. 24(39), 8441–8453.

Boahen KA. 2000. Point-to-point connectivity between neuromorphic chips using address-events. IEEE Trans. Circuits and Syst. II 47(5), 416–434.

Boahen KA. 2004a. A burst-mode word-serial address-event link—I: transmitter design. IEEE Trans. Circuits Syst. I, Reg. Papers 51(7), 1269–1280.

Boahen KA. 2004b. A burst-mode word-serial address-event link—II: receiver design. IEEE Trans. Circuits Syst. I, Reg. Papers 51(7), 1281–1291.

Boahen KA. 2004c. A burst-mode word-serial address-event link—III: analysis and test results. IEEE Trans. Circuits Syst. I, Reg. Papers 51(7), 1292–1300.

Braitenberg V and Schüz A. 1991. Anatomy of the Cortex. Springer, Berlin.

Dally WJ and Towles B. 2004. Principles and Practices of Interconnection Networks. Morgan Kaufmann.

Dante V. 2004. PCI-AER Adapter board User Manual, 1.1 edn. Istituto Superiore di Sanità, Rome, Italy, http://www.ini.uzh.ch/~amw/pciaer/user_manual.pdf (accessed July 28, 2014).

Deiss SR. 1994. Address-event asynchronous local broadcast protocol. Applied Neurodynamics (ANdt), 062894 2e, http://appliedneuro.com/ (July 28, 2014).

Deiss SR, Douglas RJ, and Whatley AM. 1999. A pulse-coded communications infrastructure for neuromorphic systems [Chapter 6]. In: Pulsed Neural Networks (eds Maass W and Bishop CM). MIT Press, Cambridge, MA. pp. 157–178.

Fasnacht DB, Whatley AM, and Indiveri G. 2008. A serial communication infrastructure for multi-chip address event systems. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), pp. 648–651.

Georgiou J and Andreou A. 2006. High-speed, address-encoding arbiter architecture. Electron. Lett. 42(3), 170–171.

Gerstner W and Kistler WM. 2002. Spiking Neuron Models: Single Neurons, Populations, Plasticity. Cambridge University Press.

Häfliger P. 2003. CAVIAR Hardware Interface Standards, Version 2.0, Deliverable D_WP7.1b, http://www.imsecnm.csic.es/caviar/download/ConsortiumStandards.pdf (accessed July 28, 2014).

IEEE 802.3 working group. 1985. IEEE Std 802.3-1985, Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications. IEEE Computer Society, New York.

Imam N and Manohar R. 2011. Address-event communication using token-ring mutual exclusion. Proc. 17th IEEE Int. Symp. on Asynchronous Circuits and Syst. (ASYNC), pp. 99–108.

Lazzaro J, Wawrzynek J, Mahowald M, Sivilotti M, and Gillespie D. 1993. Silicon auditory processors as computer peripherals. IEEE Trans. Neural Netw. 4(3), 523–528.

Lazzaro JP and Wawrzynek J. 1995. A multi-sender asynchronous extension to the address-event protocol. In: Proceedings of the 16th Conference on Advanced Research in VLSI (eds Dally WJ, Poulton JW, and Ishii AT). IEEE Computer Society. pp. 158–169.

Lin J, Merolla P, Arthur J, and Boahen K. 2006. Programmable connections in neuromorphic grids. Proc. 49th IEEE Int. Midwest Symp. Circuits Syst. 1, pp. 80–84.

Mahowald M. 1992. VLSI analogs of neural visual processing: a synthesis of form and function. PhD thesis. California Institute of Technology, Pasadena, CA.

Mahowald M. 1994. An Analog VLSI System for Stereoscopic Vision. Kluwer Academic, Boston, MA.

Mortara A and Vittoz EA. 1994. A communication architecture tailored for analog VLSI artificial neural networks: intrinsic performance and limitations. IEEE Trans. Neural Netw. 5(3), 459–466.

Mortara A, Vittoz EA, and Venier P. 1995. A communication scheme for analog VLSI perceptive systems. IEEE J. Solid-State Circuits 30(6), 660–669.

SATA. n.d. SATA – Serial ATA, http://www.sata-io.org/ (accessed July 28, 2014).

Schwartz M. 1987. Telecommunication Networks: Protocols, Modeling, and Analysis. Addison-Wesley, Reading, MA.

Serrano-Gotarredona R, Oster M, Lichtsteiner P, Linares-Barranco A, Paz-Vicente R, Gomez-Rodriguez F, Camunas-Mesa L, Berner R, Rivas M, Delbrück T, Liu SC, Douglas R, Häfliger P, Jimenez-Moreno G, Civit A, Serrano-Gotarredona T, Acosta-Jimenez A, and Linares-Barranco B. 2009. CAVIAR: A 45 K-neuron, 5 M-synapse, 12 G-connects/sec AER hardware sensory-processing-learning-actuating system for high speed visual object recognition and tracking. IEEE Trans. Neural Netw. 20(9), 1417–1438.

Sivilotti M. 1991. Wiring considerations in analog VLSI systems with application to field-programmable networks. PhD thesis. California Institute of Technology Pasadena, CA.

__________

1 Most of the text in this section is © 2000 IEEE. Reprinted, with permission, from Boahen (2000).

2 A large part of the text in this section is adapted from AER (1993).

3 A large part of the text in this section is © 2008 IEEE. Reprinted, with permission, from Fasnacht et al. (2008).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset