Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

3 Basic and Advanced On-Chip Busses and Interconnects

This chapter will begin by providing an overview of the buses and interconnects that are used within an SoC. We will introduce the concepts of data sharing and coherency and how to solve their associated challenges. This will give you a good introduction to the Advance Microcontroller Bus Architecture (AMBA) and the Open Core Protocol (OCP) protocols. Finally, we will cover the data movement within an SoC and how to use DMA engines.

In this chapter, we’re going to cover the following main topics:

On-chip buses and interconnects overview
ARM AMBA interconnect protocols suite
OCP interconnect protocol
DMA engines and data movements
Data sharing and coherency challenges

On-chip buses and interconnects overview

FPGA and ASIC-based SoCs are built using multiple components, which are a combination of modules provided as macros by the FPGA vendor, designed in-house usually in RTL, and third-party modules that require a form of licensing to use. These modules are commonly referred to as intellectual properties (IPs). These IPs are connected in a topology specified by the SoC hardware architecture using buses and interconnects. They collaborate, which means they need to interact at runtime to implement a specific set of tasks as part of the system’s overall functionality. There are many levels of functional complexity and features that a given interconnect supports. These are based on a bus protocol specification such as ARM AMBA, OCP, or IBM CoreConnect, to mention a few. In this chapter, we will focus on ARM AMBA, which is a collection of bus protocols grouped under a specific AMBA standard revision. However, we will also cover the OCP bus protocol as it is also common in the ASIC SoC designs and you may need to design OCP to AMBA bridges to integrate third-party IPs that use interfaces based on the OCP protocol specification. You may also be porting an ASIC-based SoC to an FPGA-based SoC for prototyping and/or production. This will also help us compare it to ARM AMBA in terms of functional features and complexity.

On-chip bus overview

The communication bus is the medium over which two interfaces from different IPs communicate. A bus can be as simple as a point-to-point single transaction or a single-threaded connection lane or implemented using a complex many-to-many, multi-threaded, and multi-layer transactions protocol. At one end of the bus, there is a transaction initiator or a master interface while at the other end, there is a target or a slave interface. The initiator puts a data access request on the bus that the target responds to by consuming the write data provided by the initiator or by providing the read data requested by the initiator. A simple bus is usually composed of the following lanes:

A data lane (or lanes for cases where the read path is different than the write path).
An address lane to specify the exact storage location of the data in the completer or target’s address space.
A control lane to qualify the transaction as a read or write and provide synchronization information as to when a given control signal is valid, ready, and so on.

The following diagram illustrates the concept of a simple bus:

Figure 3.1 – Simple point-to-point bus

On-chip interconnects

Interconnects add many functional features to the point-to-point bus, which make them able to connect many buses in a single shared system address space. This usually requires a central switching capability, bus protocol conversion, buffering, and arbitration agent that can dynamically allow a requesting initiator and a target to establish a connection and exchange data, even when they don’t belong to the same bus protocol standard. Interconnect switches differ in complexity from a simple crossbar switch to a Network-on-Chip (NoC) with protocol layers conversion, data coherency, advanced routing capabilities, multi-transactions, and multi-threaded capabilities. The following diagram illustrates the concept of a crossbar switch being used to interconnect multiple IPs using simple buses. The simple features are just like those in a point-to-point simple bus case:

Figure 3.2 – Simple switching matrix-based interconnect

In an NoC, the initiator can be multi-threaded, meaning that multiple hardware threads can share the same bus interface in the NoC. Here, the interconnect can use techniques to track many parallel transactions originating from the same hardware bus interface toward targets, even though they originate from different hardware threads. Every bus protocol that supports this feature has mechanics that allow the NoC, the initiators, and the targets to share the bus medium and fulfill these types of transactions. There is also support for concurrent transactions for both read and write since the bus uses multiple paths for the request phase, the command response phase, and the data phases. In some bus protocols, such as AMBA AXI, there are multiple data paths – one for the read and one for the write, allowing concurrent read and write transactions. It can also issue many of these transactions before waiting for the ongoing ones to complete first, though this is quantitatively limited by the transaction type issuing capability of the initiator, the NoC, and the target. The read and write transactions are also varied and aligned with many kinds of masters, such as DMA engines, processors, and processor caches. The following diagram illustrates the concept of NoC:

Figure 3.3 – Network-on-chip-based interconnect

ARM AMBA interconnect protocols suite

AMBA is a collection of bus and interconnect specifications provided by ARM for use in SoCs to attach initiators to targets with different levels of features and protocol complexity. It is an open standard and free to use in SoCs. The specification can be accessed from the ARM website at https://developer.arm.com/architectures/system-architectures/amba/specifications.

The current revision is revision 5. Historically, each newer revision kept backward compatibility with the previous ones but also added new bus protocols with newer features that were developed to keep pace with the modern SoC complexities and higher performance demands. AMBA had to evolve to support multi-core and multi-cluster CPU topologies, which also use some sort of accelerator with all the challenges they present while maintaining the support for lower performance and simpler SoC topologies. The more complex a hardware system becomes. the more power-hungry it will become. ARM has been providing features that balance all the system Key Performance Indicators (KPIs) to still meet the needs of building complex SoCs.

ARM AMBA standard historical overview

The first AMBA standard or AMBA specification revision 1.0 (AMBA1) was available from ARM in 1996. It included two bus protocols: the Advanced System Bus (ASB) and the Advanced Peripheral Bus (APB). Only the APB bus protocol is still in use in today’s SoCs and mostly for registers access and as a configuration interface for IPs. The second revision of the AMBA specification (AMBA2) was released in 1999 and added the Advance High-performance Bus (AHB) to the existing ASB and APB bus protocols from revision 1.0. Revision 3.0 (AMBA3) was released in 2003 and added the ARM eXtensible Interface (AXI-3) for high-performance data exchanges and ARM Trace Bus (ATB) for trace data encoding and transfer capabilities between elements of ARM CoreSight (ARM’s SoC debug and tracing technology). AMBA revision 4.0 (AMBA4) was first released in 2010 by launching the AXI-4 bus protocol as an upgrade to AXI-3. In 2011, ARM added coherency support, which resulted in the introduction of the AMBA AXI-4 Coherency Extension (ACE) bus protocol. AMBA4 also defines the AXI Streaming protocol, which is used in point-to-point connectivity between IPs for high data transfers. It also added the low power interface specification for clock management and power control. These were known as the Q-Channel and P-Channel interfaces, respectively. The latest AMBA revision is 5.0 (AMBA5), which was released in 2013. It defines the Coherent Hub Interface (CHI) protocol as an interface with the capability to sustain high-performance data exchanges while also interconnecting processors in a cache-coherent way. AMBA5 also upgraded the AXI bus to AXI-5 and, over time, added support for many newer bus protocols, such as the Credited eXtensible Stream (CXS) protocol for point-to-point data exchanges between IPs. AMBA5 introduced the AMBA Adaptive Traffic Profile (ATP) specification, which isn’t a bus interface by itself but rather a qualitative specification associated mainly with the AXI master interfaces in real-time transactions and their timing. AMBA5 includes the AMBA Generic Flash Bus (GFB) specification, which is specifically designed to support non-volatile memories such as flash device transaction types. AMBA5 also includes the AMBA Distributed Translation Interface (DTI), which defines the protocol that’s used by elements of the system memory management units for system-level address translation services within the SoC. The following diagram visualizes the historical evolution of the AMBA standard and summarizes the specific bus and interconnect protocol within each revision of the standard:

Figure 3.4 – AMBA interconnect standards and bus protocol evolution

APB bus protocol overview

This section will explore the APB bus protocol, its evolution throughout the different AMBA standard revisions, and its added features and mechanisms. We will gain an understanding of this bus’s supported transactions, signaling, and application use cases. We will provide an example system implementation using the APB bus.

APB bus protocol evolution

The APB bus is the simplest interface protocol included in the ARM AMBA standard. It has evolved since its first inception in 1996. Every revision added new features and improvements to the protocol while keeping backward compatibility with the previous revisions. APB revision 1.0 is now obsolete and only APB2, APB3, APB4, and APB5 are still active protocols in the industry.

APB2 is considered the base APB protocol for defining the signal interfaces, the read and write supported transactions, and the two APB available components, namely the APB bridge and the APB slave.

The APB3 protocol added support for wait states and transaction error reporting to the base APB2 protocol. The PREADY and PSLVERR signals were added to allow this protocol to expand.

In the APB4 protocol, transaction protection and sparse data transfer features were added. The PPROT signal implements the secure transaction access, so it is used to distinguish between secure and non-secure transactions over the APB bus, while the PSTRB signal implements the write strobe to enable the sparse data write transaction between an AMBA master and an OPB slave.

The APB5 revision extends the APB4 protocol with support for wakeup signaling, user signaling, parity protection, and check signals. These features can enhance the SoC’s power consumption, extend the APB protocol by using custom sideband signaling, and improve reliability in the system.

APB bus characteristics

The APB protocol is a simple interface that requires minimal silicon resources for its implementation in an SoC compared to other AMBA bus protocols, thus making it a low-power bus. Data transfers on the APB bus require a minimum of two clock cycles. The APB bus is designed to be a side or secondary bus through which the CPU can implement a control path, thus avoiding any interference from the main data bus, which is implemented using one of the high performances AMBA buses such as AXI or AHB. At runtime, the CPU can use the APB bus to set up an IP registers file, read the status of IP transactions that have been completed, or any other control path-related tasks that software can easily split from the data path to free it for high throughput access, such as I/O packet data or memory data structures through the CPU caches. The IP APB ports are usually grouped as a tree of up to 16 ports that hang off an APB bridge from the main SoC interconnect. The APB bridge performs the transactions protocol conversion and mapping. The APB bridge acts as the transaction’s initiator in this topology, while the APB slave behaves as the target that responds to these transactions. The APB specification also refers to the APB bridge as the Requester and the APB slave or target as the Completer.

APB bus interface signals

The following diagram illustrates the connectivity between an APB Requester and an APB Completer, where all the signals defined by APB5 are present:

Figure 3.5 – APB bus interface signals

As shown in the preceding diagram, the APB bus signals are split into three categories: the Requester-driven data lane and signals, the Completer-driven data lane and signals, and the SoC bus clock and reset signals. The following table provides a short description of the Requester signals:

Signal	Width	Description
PADDR	32	Address Lane, driven by the Requester.
PPROT	3	Protection type control signal. It indicates if this is a normal, privileged, or secure operation. It is also used to indicate if this is data or instruction access.
PSELx	1	Select signal. Indicates to the Completer that it has been selected by the Requester to respond to an incoming transaction.
PENABLE	1	Enable signal. It is set to indicate the start of the second cycle in an APB transaction.
PWRITE	1	Transaction direction. When HIGH, it is a Write; when LOW, it is a Read.
PWDATA	DW	Write the data lane of DW width. The Requester uses this lane to place the write data that targets the Completer. DW is usually 8, 16, or 32 bits.
PSTRB	DW/8	Write strobe signals. Each signal indicates that the corresponding data byte is valid.
PWAKEUP	1	Wake-up signal. This signal is used by the Requester to indicate activities associated with this APB interface.
PAUSER	URW	User-defined request attribute lane. This lane can be used to extend the APB protocol and customize it to support other features and attributes not defined by the APB5 protocol. This lane represents the request phase of a custom-defined transaction.
PWUSER	UDW	User-defined write attribute lane. This lane can be used to extend the APB protocol and customize it to support other features and attributes not defined by the APB5 protocol. This lane can carry the write data from the Requester to the Completer of an APB custom-defined write transaction.

Table 3.1 – APB bus interface signals description

The following table provides a short description of the Completer signals:

Signal	Width	Description
PREADY	1	Ready signal. This signal is used by the Completer to qualify or extend the APB transaction on the bus.
PRDATA	DW	Read data lane. Used by the addressed Completer to provide the requested read data. It is usually 8, 16, or 32 bits wide.
PSLVERR	1	Transfer error signal. Used by the slave to indicate the transaction completion status to the Requester.
PRUSER	UDW	User-defined read attribute lane. This lane can be used to extend the APB protocol and customize it to support other features and attributes not defined by the APB5 protocol. This lane can carry the read data returned by the Completer to the Requester of an APB custom-defined read transaction.
PBUSER	URW	User-defined response attribute lane. This lane can be used to extend the APB protocol and customize it to support other features and attributes not defined by the APB5 protocol. This lane represents the response phase of a custom-defined transaction.

Table 3.2 – APB bus interface Completer signals description

The following table lists the SoC signals:

Signal	Width	Description
PCLK	1	Clock signal. This is a common signal provided by the SoC to both the Requester and Completers.
PRESETn	1	Reset signal. This is a common active LOW signal provided by the SoC to both the Requester and Completers.

Table 3.3 – APB bus interface SoC signals description

APB bus-supported transactions

The APB bus’s latest protocol – that is, APB5 – defines the following transaction types:

Write transactions without the Wait state
Write transactions with Wait state insertion
Read transactions without the Wait state
Read transactions with Wait state insertion
Write transactions with Write strobes
Error response
Secure and non-secure transactions
Wake-up transaction signaling
User signaling

This section only attempts to provide a functional overview of the preceding supported transactions in APB5. You are encouraged to study the AMBA5 standard that defines the APB5 bus protocol for further implementation details. The APB5 bus protocol can be found at https://developer.arm.com/documentation/ihi0024/latest/.

Write and read transactions are simple back-to-back transactions that are set by the Requester in a predefined manner so that the Completer can sense them and decode the transaction accordingly. Data is then accepted from the write lane or provided on the read lane to the Requester. As mentioned previously, the APB bus is a simple transaction bus with no pipelining or burst support. The protection support is worth highlighting as it adds an orthogonal qualification to the transaction using the PPROT bits, as follows:

PPROT[0]: This indicates if this is a Normal or Privileged transaction and usually reflects the status of the running state of the Requester. It can easily be mapped by the CPU to its execution state (kernel mode or user mode, for example).
PPROT[1]: Secure or non-secure. This signal can be used to implement hardware security by allowing access to certain registers when the transaction is classified as secure – that is, PPROT[1] is LOW. When PPROT[1] is set to HIGH by the Requester, access to secure registers, for example, is prohibited by the Completer.
PPROT[2]: Data or instruction. This signal provides a hint regarding the type of data exchanged by the transaction.

APB bus example system implementation

As mentioned previously in this section, the APB Requester is an APB bridge that translates an upstream protocol, such as AHB or AXI, into an APB bus protocol to provide a control path to a CPU through which it can set up IP peripheral registers and check the runtime status following an interrupt event, for example. The following diagram provides an overview of such an implementation, where the APB bus is connected to the SoC interconnect through an AXI to APB bridge:

Figure 3.6 – APB bus example system implementation

AXI bus protocol overview

This section will explore the AXI bus protocol, its evolution throughout the different AMBA standard revisions, and its added features and mechanisms. We will gain an understanding of this bus’s supported transactions, signaling, and application use cases. We will provide an example system implementation using the AXI bus. This section tries to provide a detailed enough overview of the AXI-3 and AXI-4 bus protocols as these are used in the Zynq-7000 SoC and Zynq UltraScale+ SoC FPGAs. AXI-5 is mentioned for completeness only, but you are still encouraged to study it using the previously mentioned AMBA5 standard specification from ARM.

AXI bus protocol evolution

As illustrated in Figure 3.4, the AXI bus protocol first appeared in the AMBA3 standard as a high-performance bus interface to become the ARM-based SoC main data path interconnect.

AMBA4 added more features and supporting channels and signals to AXI to produce AXI-4, a richer bus protocol with backward compatibility and interoperability with AXI-3. AXI-4 extended the burst lengths from 16 to 256 beats, removed the write interleaving and the locked transaction support features introduced in AXI-3, and added the notion of transaction Quality of Service (QoS) support. Like in APB5, AXI-4 added support for user-defined channels using the AxUSER (address), WUSER (write data), RUSER (read data), and BUSER (response channel) lanes. These can be used to extend the AXI-4 protocol and customize it to support other features and attributes not defined by the base AXI-4 protocol. AXI-4 also added support for regions using the AWREGION and ARRREGION vectors, which can encode up to 16 regions. Consequently, the same AXI-4 slave port can implement multiple logical interfaces mapped to a different region in the system address map.

AMBA4 also defined the AXI Streaming protocol, which allows high throughput point-to-point data exchanges between two connected IPs.

AMBA4 also added AXI Lite, which is a simplified version of the full AXI bus protocol.

AMBA5 defined AXI-5, which, in the latest AXI revision, provides even higher performance, scalability, and a wider feature choice for system IPs data and signaling exchanges in an SoC when using AXI as an interconnect between masters and slaves. AXI-5 extends support for atomic transactions, cache stashing and de-allocation, and data protection and poisoning signaling. It also added support for persistent cache maintenance operations (CMOs), among many other features and options.

AXI bus characteristics

The AMBA AXI protocol is a high-performance and high-speed interconnect for modern SoC designs. It provides a high-throughput and low-latency system bus for highly demanding CPU clusters. It can be easily interconnected via simple bridges in the SoC interconnect with existing AHB and APB-based IPs. The AXI interface relies on multiple lanes with separate address/control and data phases to initiate a transaction. It supports unaligned data transfers by using byte strobes. AXI transactions use bursts to transfer data, which only requires a single start address. AXI is a true full-duplex bus that uses two separate channels – one for write transactions and one for read transactions. They also support multiple concurrent transactions to different addresses with an out-of-order completion capability.

AXI bus interface signals

The AXI bus uses five channels, as follows:

The write address
The write data
The write response
The read address
The read data channels

It also has a low-power interface. There are also global signals, namely clock and reset, which are driven by the SoC. The following diagram depicts the AXI bus multi-channel topology:

Figure 3.7 – AXI bus multi-channel topology

AXI bus global signals

The following diagram illustrates the connectivity of the global signals:

Figure 3.8 – AXI bus global signals connectivity