,

16

Towards Large-Scale Neuromorphic Systems

images

This chapter describes four example neuromorphic systems: SpiNNaker, HiAER, Neurogrid and FACETS. These systems combine subsets of the principles outlined in previous chapters to build a large-scale hardware platform for neuromorphic system engineering. Each of them represents a different point in the design space, and so it is instructive to examine the choices made in each system.

16.1  Introduction

Multichip AER can be used to construct neuromorphic systems that can scale up to millions of neurons. Part of the reason for this scalability is the evidence from biology – since biological systems have billions of neurons, the biological computational principles and communication structures must inherently be such that they can scale to systems of that size. Since neuromorphic systems emulate biological systems, we can assume that emulating biologically plausible networks is a task that will scale with the size of the network.

16.2  Large-Scale System Examples

This chapter discusses four case studies in implementing large-scale neuromorphic systems: SpiNNaker, a system from the University of Manchester; HiAER, a system from the University of California at San Diego; Neurogrid, a system designed at Stanford University; and FACETS, an EU project led by the University of Heidelberg. The four systems take very different approaches to the problem of implementing neurons, ranging from using generic digital microprocessors to wafer-scale integrated custom analog electronics. Each system uses a different approach to the design of the communication network for spikes, as well as the implementation of synapses.

16.2.1  Spiking Neural Network Architecture

Of current developments, the SpiNNaker project at the University of Manchester has taken the most ‘general-purpose’ approach to the design of large-scale neuromorphic systems.

System Architecture

The core hardware element for the neuromorphic system is a custom-designed ASIC called the SpiNNaker chip. This chip includes 18 ARM processor nodes (the ARM968 core available from ARM Ltd., one of the project’s industrial partners), and a specially designed router for communication between SpiNNaker chips. One of the ARM968 cores is designated as the Monitor Processor, and is responsible for system management tasks. Sixteen of the other cores are used for neuromorphic computation, while the extra core is a spare and is available to improve the manufacturing yield. Each SpiNNaker die is mated with a 128 MB SDRAM die, and the two dice together are packaged as a single chip (Furber et al. 2013).

Each chip can communicate with six nearest neighbors. Each chip participates in a horizontal ring (utilizing two links), a vertical ring (utilizing two links), and a diagonal ring (utilizing two links). This can also be viewed as a two-dimensional torus network (the horizontal and vertical links) with additional diagonal links. The overall network topology is shown in Figure 16.1. The overall system is envisioned to contain over a million ARM cores.

Neurons and Synapses

The low-power ARM cores are used to implement all the neuronal and synaptic computation in software. This provides maximum flexibility, since the precise details such as the neuron equation, governing dynamics, the amount of biological detail present in the neuron and synaptic model, and so on are all under user control since the core compute element in the system is a general-purpose microprocessor.

Each ARM core is fast enough to be able to model ~103 point neurons, where each neuron has ~103 synapses that are modeled as programmable weights. A million core system would be able to implement ~109 neurons and ~1012 synapses, with the entire system operating in (biological) real time. Since the computation for a neuron is dominated by synaptic modeling, a more accurate way to view the computation power per core is to say that each core can model ~106 synapses, which corresponds to 103 neurons per core on average.

images

Figure 16.1  Chip-to-chip communication topology in the SpiNNaker system, showing a 16 node system. Each node in the system includes neuron resources (shown in dark gray) and routing resources (shown in light gray), and the arrows show communication links between SpiNNaker chips

Communication

The communication network is responsible for delivering spikes to their appropriate destinations. Since a neuron has ~103 synaptic inputs on average, the average output fan-out for a spike generated by a neuron is also ~103. SpiNNaker has a custom-designed communication architecture for spike delivery that handles both routing and fan-out.

When a neuron produces an output spike, it is given a source routing key and is delivered to the communication fabric for delivery to a set of destination neurons. Instead of a central routing table, each SpiNNaker router contains information that is sufficient to make local decisions per spike. When a spike arrives on one of the six incoming ports, the source key is matched against a 1024-entry ternary content-addressable memory (TCAM) with 32 bits (same as the source key) per entry. A TCAM entry specifies either a 0, 1, or X value for each bit, and a source key matches an entry if it agrees with the entry for every not-X location. If there is a match, then a 24-bit-long vector corresponding to the matched entry is retrieved. The bits of this vector represent each of the on-chip ARM cores (18 bits) and the six output ports of the communication network (6 bits). Routing is straightforward: the spike is propagated to all the locations for which the corresponding bit is set. If there is no match in the TCAM, the default route is a ‘straight line’ – that is a packet arriving from the left would be propagated to the right, and so on for all six potential incoming directions.

The communication infrastructure also has support for a variety of packet types (point-to-point, nearest-neighbor), as well as support for fault detection, isolation, and recovery. Also, the design of the large-scale system is simplified by not requiring global clock synchronization – the communication network is entirely asynchronous, which makes SpiNNaker a ‘globally asynchronous, locally synchronous’ (or GALS) system.

Since the SpiNNaker system is supposed to operate in real time, the hardware does not provide precise guarantees about spike delivery. For example, the time taken for a spike to travel through a router may be impacted by other unrelated spikes using the shared router resource. Therefore, the exact spike arrival time is not a deterministic quantity. However, this is not viewed to be an item of concern because the biological systems being modeled are supposed to be robust to small variations in spike arrival times relative to the biological time scale.

Programming

The flexibility provided by the SpiNNaker system is both attractive and has a drawback – the positive aspect is that it is easy to change the details of neuron and synaptic modeling; the negative aspect is that all the details of the models have to be specified. A significant amount of work has been put into developing models that are easily accessible to the neuromorphic systems community.

The SpiNNaker team has developed mechanisms to map neural network models in PyNN to their hardware. This approach makes the system readily accessible to users of PyNN. Conceptually this can be viewed as ‘compiling’ PyNN descriptions into the appropriate software and router configuration information that implement the same model on the SpiNNaker hardware. Significant work is currently underway to expand the set of models that can be mapped to the hardware.

16.2.2  Hierarchical AER

Another proposal for the design of large-scale neuromorphic systems is being advocated by the University of California at San Diego (UCSD) as a way to extend their IFAT architecture described in Section 13.3.1 to systems with many more neurons. The HiAER combines custom neurons and synapse modeling hardware with programmable routing.

System Architecture

The architecture of the system consists of two types of components: the IFAT board and the routing chip. The IFAT system is the core computational element that implements neurons and synapses. Each IFAT system implements an array of neurons and their associated synapses. Each IFAT node has a local routing resource associated with it that is responsible for AER-based spike communication. Sets of IFAT/router nodes are connected in a linear array, and the edge of each array is responsible for communication to other arrays. These edge nodes are also organized in linear arrays, and so the entire system can be viewed as a hierarchy of arrays as shown in Figure 16.2.

images

Figure 16.2  Chip-to-chip communication topology in the HiAER system, showing a 32 node system. Each node in the system includes neuron resources (shown in dark gray) and routing resources (shown in light gray), and the arrows show communication links between individual nodes. This example shows three levels of hierarchy, but the approach can be extended to an arbitrary number of levels

Neurons and Synapses

The custom analog VLSI chips used in IFAT can model 2400 identical integrate-and-fire neurons each, and the IFAT board contains two of these chips along with support logic implemented using an FPGA. Switched-capacitor analog neurons on the chip implement a discrete time single-compartment model with multiple conductance-based synapses and a static leak. Neurons on the IFAT chip can be individually addressed, and only maintain an analog membrane voltage. Weights and reversal potentials are externally supplied and can therefore be configured on a per-neuron basis. An external FPGA and digital-to-analog converter (DAC) provide these parameters along with the neuron address to the IFAT chip, and this results in the specified neuron updating its membrane state (Vogelstein et al. 2004). Any spike generated by the neuron is also received by the FPGA that can then use connectivity information to transmit the spike to the appropriate set of destination neurons.

Communication

Spikes that are local to an individual IFAT array are looped back internally without having to travel through the routing network. Nonlocal routing in the HiAER system is tag-based. When a nonlocal spike is generated by an IFAT neuron, it is transmitted to the router that is immediately adjacent to the IFAT system. This first level router broadcasts (via individual nearest-neighbor communication) the spike to the local row, and each individual router in the row matches the tag against a local routing table. If there is a local match, then the spike is communicated to the local IFAT array. The local IFAT array uses the tag as an index into a local routing table to identify the set of synapses for spike delivery. Since the IFAT system has a programmable FPGA and a local RAM for extra storage, the mechanism for local spike delivery can be changed as necessary.

One of the first-level routers does not have an associated IFAT array (see Figure 16.2). This router is responsible for communication beyond the local linear array. The router behaves in the same manner as the first-level router – when it receives a spike, it matches the tag against a local routing table to see if the spike should propagate to the next level of routing. If so, this spike is transmitted to the next level of routing, where a tag matching operation is performed in a manner similar to the first-level routing. The difference is that each second-level router is connected to a collection of IFAT systems and routers, not just one IFAT. This process can be repeated in a hierarchical manner to build a large-scale neuromorphic system (Joshi et al. 2010).

Each tag can be viewed as a ‘destination pattern’ – two spikes with the same tag are delivered to the same set of destination synapses. However, because there is locality in communication, parts of the global network that have nonoverlapping spike communication can use the same tags to refer different destination synapses. Therefore, the number of bits used to represent a tag is not a direct limitation on the number of total neurons/destination patterns in the system.

Programming

While the neuron model and synapse model itself cannot be changed, since it has been implemented with dedicated hardware, the parameters for each neuron and synapse are externally supplied. Therefore, programming the system corresponds to selecting the model parameters for each neuron, selecting tag values for spikes, and providing configuration information for all the routing tables in the communication network.

16.2.3  Neurogrid

The Neurogrid project at Stanford University consists of a system that comprises almost entirely of custom hardware for modeling biological neurons and synapses.

System Architecture

The core hardware element in Neurogrid is the Neurocore chip, which is a custom ASIC that uses analog VLSI to implement neurons and synapses, and digital asynchronous VLSI to implement spike-based communication (Merolla et al. 2014a). The chip was fabricated in a 180 nm process technology, and contains a 256 × 256 array of neurons, whose core functionality is implemented with analog circuits.

Each Neurocore chip can communicate with three other chips, with the routing network organized in a tree fashion (see Figure 16.3). The Neurogrid system is able to model one million neurons by assembling a tree of 16 Neurocore chips. Finally, it is possible to enhance the flexibility of the Neurogrid system by connecting the root of the tree to an external device (e.g., an FPGA) to perform arbitrary spike processing or routing operations.

images

Figure 16.3  The Neurogrid system showing its tree topology between individual Neurocore chips. Each Neurocore chip can model neurons and has support for a multicast-based tree router. The third connection from the root of the tree is not shown

Neurons and Synapses

Each Neurocore chip contains 256 × 256 neurons. Each neuron is implemented using custom analog circuitry that directly implements a continuous-time differential equation model for neuron behavior. The neurons implement a quadratic integrate-and-fire (QIF) model, and is combined with four types of synapse circuits (Benjamin et al. 2012). The synapse circuits themselves implement a superposable synapse circuit, allowing a single circuit to model an arbitrary number of synapses of a given type. This approach enables the Neurogrid system to model a very large number of synapses with minimal overhead – the state required corresponds to representing connectivity between neurons rather than the connectivity to specific individual synapses.

Communication

The Neurogrid architecture uses three separate modes for communication: (i) point-to-point spike delivery; (ii) multicast tree routing; and (iii) analog fan-out. The Neurocore chips are organized in a tree, and spikes traverse up the tree to an intermediate node, and then traverse down the tree to the destination Neurocore chip. Spikes are source-routed: each spike contains the path taken by the packet through the network (Merolla et al. 2014a). For point-to-point routing, the packet simply specifies the routing information for each hop.

Multicast routing is supported by permitting packet flooding when the packet travels down the routing tree. In flooding mode, all Neurocore chips in the subtree receive a copy of the spike. Because this mode can only be used for packets traversing down the tree, the network is deadlock free, since there are no cycles in the routing graph.

A final mechanism for spike delivery supported in Neurogrid is the notion of arbors. When a spike arrives at a destination neuron, it is delivered to a programmable neighborhood of the neuron using an analog diffusor network. This means that a single spike effectively delivers input to a population of neurons, further increasing the number of neurons receiving the spike.

Programming

While the neuron model and synapse model itself cannot be changed, since it has been implemented with dedicated hardware, the parameters for both models are externally supplied. Therefore, programming the system corresponds to selecting the model parameters for neurons and synapse types in a chip and providing configuration information for all the routing tables in the communication network.

16.2.4  High Input Count Analog Neural Network System

The High Input Count Analog Neural Network (HICANN) system in the FACETS project has very different goals compared to the previously described designs. While the previous systems were designed with real-time operation at biological time scales in mind, the HICANN system takes the approach of providing a platform that enables accelerated modeling of neural network dynamics, while supporting neurons with 256–16,000 inputs. Supporting acceleration factors of up to 105 was a design consideration in this project.

System Architecture

Instead of building individual chips with a communication network, the FACETS project uses wafer-scale integration. A wafer contains repeated instances of the same reticle, so the system is a repeated array of an individual HICANN chip, where each reticle contains eight HICANN chips. The choice of wafer scale integration makes fault tolerance a requirement in the FACETS design.

Neurons and Synapses

Each HICANN chip contains two Analog Neural Network Core (ANNCORE) arrays that contain analog circuits for modeling neurons and synapses. The analog circuits can implement two neuron models: a conductance-based integrate-and-fire model, and an adaptive exponential integrate-and-fire model.

Each ANNCORE contains 128 K synapse circuits and 512 neuron membrane equation circuits organized in two groups. Each group has a 256 × 256 synapse array and 256 membrane circuits, together with programmable connectivity to enable a set of synapses to be grouped with each membrane circuit. If all the neuron circuits are used, each neuron can have 256 synapses. However, other configurations with fewer neurons per ANNCORE and more synapses per neuron are possible, up to a maximum of 16 K synapses per neuron (Schemmel et al. 2008).

Neuron and synapse parameters can be set via a digital interface, and on-chip DACs convert them into the appropriate analog input for the ANNCORE circuits.

images

Figure 16.4  A wafer-scale FACETS system, with neuron and synapse resources (shown in dark gray), and communication resources (shown in light gray). The communication resources are statically set using programmable switches, rather than packet-based routing. Bus lanes are shown running horizontally and vertically, and the connectivity between the lanes is controlled by programmable static switches

Communication

The goal of highly accelerated modeling imposes significant stress on communication resources. By adopting a wafer-scale approach, the FACETS project leverages the higher on-chip bandwidth available in modern CMOS processes. If we view the chips on a wafer as a grid, the FACETS routing architecture uses multiple parallel lanes both in the horizontal (64 lanes) and vertical (256 lanes) directions between individual ANNCORE arrays (Figure 16.4). Groups of 64 neurons time-multiplex their spikes onto a single lane. The lanes use low-voltage differential serial signaling to reduce power consumption, and address bits are transmitted in a bit-serial fashion.

Rather than using a packet-based routing approach, routing resources are statically allocated using programmable switches similar to the way an FPGA routing fabric is designed. Connections between lanes at chip boundaries can be shifted by one position, to increase the flexibility of the routing network.

Each bus can be ‘tapped’ by a HICANN chip that internally contains address matching logic. When a neuron address is transmitted on a lane that is tapped by a HICANN chip, the lane address is compared to the locally stored addresses. On a match, a spike is delivered to the appropriate synapse (Fieres et al. 2008).

There are many restrictions on connectivity and the final choices made by the FACETS design balance flexibility in routing against area and power constraints. A hardware efficiency of ~40% is shown in some of the experiments conducted by the FACETS team (Fieres et al. 2008).

Programming

Programming the system corresponds to selecting neuron and synapse groupings, parameters, and mapping the connectivity to the configurable routing resources in a manner that is analogous to the place-and-route flow in an FPGA.

16.3  Discussion

Each large-scale neuromorphic system has made very different choices when it comes to key design decisions: neuron model, synapse model, and communication architecture. A summary of the key differences is shown in Table 16.1.

The different choices made lead to systems that have different strengths. SpiNNaker is clearly the most flexible, but therefore also has the highest overhead for modeling neural systems. HiAER and Neurogrid use low-power analog neurons and synapses with digital communication and are optimized for different types of networks. In particular, Neurogrid is architected to be particularly efficient at implementing the columnar structure found in cortex, whereas HiAER has been designed to support more general connectivity. The HICANN system has been optimized for speed, and therefore cannot use the same level of multiplexing of communication links as Neurogrid or HiAER.

There are other large-scale system design projects underway, probably the most notable being the ‘TrueNorth’ (TN) effort being led by IBM to develop a low-power cognitive computing platform (Imam et al. 2012; Merolla et al. 2014b). The building block for their architecture is a ‘neurosynaptic core’, consisting of an array of digital neurons combined with synapses organized in a cross-bar configuration with a limited number of synaptic weights. The approach is fully digital, with a combination of asynchronous circuits and a synchronization clock that governs the timing precision of the neurons. The TN architects also made the decision to ensure deterministic computation; the TN neurons faithfully replicate a simulation exactly, thereby allowing easier design of a system while giving up some low-level efficiencies used in biological computation, such as analog dendritic state and probabilistic synaptic activation. This approach represents a different design point compared to the other four platforms discussed in this chapter.

Table 16.1  Key differences in choices for neuron and synapse models, and communication architecture in large-scale neuromorphic systems

images

The architecture of large-scale neuromorphic systems will continue to evolve. New discoveries will be made and incorporated into architectural choices. Different researchers will take their own unique approach to the problem of modeling neuroscience. Ultimately these decisions will be validated by real-world application and market success, but it is too early to say which choices will prevail.

References

Benjamin BV, Arthur JV, Gao P, Merolla P, and Boahen K. 2012. A superposable silicon synapse with programmable reversal potential. Proc. 34th Annual Int. Conf. IEEE Eng. Med. Biol. Society (EMBC), pp. 771–774.

Fieres J, Schemmel J, and Meier K. 2008. Realizing biological spiking network models in a configurable wafer-scale hardware system. Proc. IEEE Int. Joint Conf. Neural Networks (IJCNN), pp. 969–976.

Furber S, Lester D, Plana L, Garside J, Painkras E, Temple S, and Brown A. 2013. Overview of the SpiNNaker system architecture. IEEE Trans. Comput. 62(12), 2454–2467.

Imam N, Akopyan F, Merolla P, Arthur J, Manohar R, and Modha D. 2012. A digital neurosynaptic core using event-driven QDI circuits. Proc. 18th IEEE Int. Symp. Asynchronous Circuits Syst. (ASYNC), pp. 25–32.

Joshi S, Deiss S, Arnold M, Park J, Yu T, and Cauwenberghs G. 2010. Scalable event routing in hierarchical neural array architecture with global synaptic connectivity. 12th International Workshop on Cellular Nanoscale Networks and Their Applications (CNNA), pp. 1–6.

Merolla P, Arthur J, Alvarez R, Bussa t JM, and Boahen K. 2014a. A multicast tree router for multichip neuromorphic systems. IEEE Trans. Circuits Syst. I 61 (3), 820–833.

Merolla PA, Arthur JV, Alvarez-Icaza R, Cassidy AS, Sawada J, Akopyan F, Jackson BL, Imam N, Guo C, Nakamura Y, Brezzo B, Vo I, Esser SK, Appuswamy R, Taba B, Amir A, Flickner MD, Risk WP, Manohar R, and Modha DS. 2014b. A million spiking-neuron integrated circuit with a scalable communication network and interface. Science 345(6197), 668–673.

Schemmel J, Fieres J and Meier K. 2008. Wafer-scale integration of analog neural networks Proc. IEEE Int. Joint Conf. Neural Networks (IJCNN), pp. 431–438.

Vogelstein RJ, Mallik U, and Cauwenberghs G. 2004. Silicon spike-based synaptic array and address-event transceiver. Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) V, 385–388.

__________

Event-Based Neuromorphic Systems, First Edition.
Edited by Shih-Chii Liu, Tobi Delbruck, Giacomo Indiveri, Adrian Whatley, and Rodney Douglas. © 2015 John Wiley & Sons, Ltd. Published 2015 by John Wiley & Sons, Ltd.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset