Preface

The continuing advance of semiconductor technology has led to dramatic increases in transistor densities. Designing complex single-core processors brings about a sharp increase in power consumption, but this is accompanied by diminishing performance returns. Instead, computer architects are actively pursuing multicore designs to efficiently use the billions of transistors on a single chip. Current processors have already integrated tens or hundreds of cores; the community is entering the many-core era. Although great breakthroughs have already been made in the design of many-core processors, there are still many critical challenges to solve, ranging from high-level parallel programming paradigms to low-level logic implementations. The correctness, performance, and efficiency of many-core processors rely heavily on communication mechanisms; addressing these critical challenges requires communication-centric cross-layer optimizations.

The traditional on-chip bus communication mechanisms face several limitations, such as low bandwidth, large latency, high power consumption, and poor scalability. To address these limitations, the network-on-chip (NoC) is proposed as an efficient and scalable communication structure for many-core platforms. Owing to its many desirable properties, the NoC has rapidly crystallized into a significant research domain of the computer architecture community. Although the NoC has some similarities with off-chip networks, its physical fabric in the latency, power, and area is fundamentally different. The NoC competes with processing cores for the scarce on-chip power and area. Thus, it can only leverage limited resources. To support high performance within tight power and area budgets, more attention should be paid to NoC optimizations, including low-level logic implementations, network-level routing and flow control, and support for high-level programming paradigms.

Zhiying Wang's research group at the National University of Defense Technology has been studying the frontier subjects of computer architecture. His group has been conducting NoC research for about 10 years, and has presented or published tens of peer-reviewed papers at several prestigious and influential conferences and in several journals. This book reviews and summarizes the research progress and outcomes reported in these publications, including three papers from the top-tier ISCA-2011 and HPCA-2012 conferences, three papers from the prestigious DAC-2008, ICCD-2011, and ASAP-2012 conferences, four papers from the flagship IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and ACM Transactions on Architecture and Code Optimization journals, and two papers from the influential Microprocessors and Microsystems journal.

This book is not a simple overview or collection of research ideas and design experiences. The editor, Zhiying Wang, and the authors wrote this volume with the purpose of exploring the NoC design space in a coherent, uniform, and bottom-up fashion, from low-level router, buffer, and topology implementations, to network-level routing and flow control designs, to co-optimizations of the NoC and high-level programming paradigms. The book is composed of five parts. Parts I and IV introduce and conclude this book respectively. Part V also presents future work.

Part II covers low-level logic implementations, and consists of Chapters 21. Chapter 2 discusses a wing-channel-based single-cycle router architecture, which significantly reduces communication delays with low hardware costs. Chapter 1 studies two dynamic virtual channel structures with congestion awareness. The designs dynamically share buffers among virtual channels or ports to reduce buffer amount requirements, while maintaining or improving performance. Chapter 4 introduces an NoC topology enhanced with virtual bus structures. The topology efficiently supports unicast as well as multicast/broadcast communications, and the latter is critical for parallel application executions.

Part III investigates routing and flow control, and includes Chapters 57. Chapter 5 delves into routing algorithms for workload consolidation. The routing algorithms provide high adaptivity and dynamic isolation for concurrent applications. Chapter 6 proposes efficient flow control for fully adaptive routing. It maximizes the utilization of limited buffer resources to scale the performance without inducing network deadlock. Chapter 7 explores deadlock-free flow control theories and designs for torus NoCs. Conventional deadlock avoidance designs for torus NoCs either negatively affect the router complexity and frequency, or reduce the buffer utilization. The proposed flit bubble flow control achieves low complexity, high frequency, and efficient buffer utilization for high performance.

Part IV explores co-optimizations of the NoC and programming paradigms. It contains Chapters 810. Chapter 8 addresses the NoC optimization for shared memory paradigms. It provides hardware implementations for cache-coherent collective communications to prevent these communications from becoming system bottlenecks. Chapter 9 customizes NoC designs for message passing paradigms. The NoC presented offers special and low-cost hardware for basic message passing interface (MPI) primitives, upon which other MPI functions can be built, to effectively boost the performance of MPI functions. Chapter 10 studies an adaptive MPI communication protocol, which combines the buffered and synchronous communication modes to provide robust and high communication performance.

Overall, on the basis of the communication-centric cross-layer optimization methods, the work presented in this book has significantly advanced the state of the art in the NoC design and the many-core processor research domain. Based on a bottom-up and thorough exploration of the NoC design space, the research presented here has addressed a multitude of pressing concerns spanning a wide spectrum of design topics. It not only greatly improves the performance and reduces the overhead for the communication layer of many-core processors, but also significantly mitigates the challenges in the logic implementation layer and the parallel programming layer.

The research described in this book has been supported by several grants from various organizations, including the 863 Program of China (2012AA010905), the National Natural Science Foundation of China (61272144, 61303065, 61103016, 61202481, 61202123, 61202122), the Hunan Provincial Natural Science Foundation of China (12JJ4070, 14JJ3002), the Doctoral Fund of the Ministry of Education of China (20134307120028, 20114307120010), and the Research Project of the National University of Defense Technology (JC12-06-01, JC13-06-02). Sheng Ma's research has also been supported by the University of Toronto and the Natural Sciences and Engineering Research Council of Canada when he visited the University of Toronto.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset