Chapter 14

Summary

In this book, we worked to demystify the Intel® Xeon Phi™ coprocessor and provide the foundational information that would fit in one book. The most challenging part of using such highly parallel device is parallel programming itself. That is a topic unto itself that we have only touched on a little in this book. The better understanding of parallel programming one has, the more sophisticated a programmer can be at eliminating nonparallel computations from an application.

If you are seeking to learn more about parallel programming, we recommend Structured Parallel Programming that was coauthored by James Reinders along with Michael McCool and Arch Robison. In that book, the authors work to explain common algorithms or patterns and how to implement them to utilize parallel programming. Perhaps most importantly, they do it without having to devote a lot of attention and learning to detailed computer architecture. The book examples are all using shared memory programming, so the important topic of MPI is not discussed. The reasoning is that parallelism is a fundamental notion that needs to be intuitive, and the models one uses should come second.

However, in the real world, you may be most interested in retrofitting a program for parallelism, in which case a solid understanding of MPI and a threading model may be a better starting point. Threading models would include OpenMP†, Intel® Threading Building Blocks (TBB) and Intel® Cilk™ Plus. There are good books on MPI, OpenMP, and TBB, and we recommend reading to learn at least MPI and a threading model. Structured Parallel Programming does explain and use TBB and Cilk Plus well.

Advice

In this book we covered many topics, so now we can summarize how to put it all to work:

• Focus on adding effective parallelism to your program, which will serve both processors and coprocessors. Use of standards languages, portable parallelism models, and standard tools so as to avoid hardware-specific coding will generally lead to this.

• Think about libraries first, such as the Intel® Math Kernel Library (MKL). Chapter 11 covers this.

• Pay attention to exposing vector operations so as to get good vectorization. Chapter 5 covers this.

• Pay attention to exposing tasks so as to get scaling through use of many core and hardware threads. Chapter 6 covers this.

• Consider offload and/or message passing.

– Offload using Intel’s extensions, or the coming OpenMP directives for offload. Chapter 7 covers this.

– Using MPI to connect ranks spread across a system. Chapter 12 covers this.

• Use tuning tools to better understand bottlenecks and address them. Chapter 13 covers this.

Additional resources

• Intel Cilk Plus Language Specification and Intel Cilk Plus Application Binary Interface Specification documents, available from http://cilkplus.org.

• Intel® Threading Building Blocks, http://threadingbuildingblocks.org

• Intel tools, http://intel.com/software/products

• Online information from Intel, http://intel.com/software/mic

• OpenMP, http://openmp.org

• Structured Parallel Programming: Patterns for Efficient Computation, Michael McCool, Arch Robison, James Reinders; Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2012.

Another book coming?

We could not fit everything in this book. As we finish this volume, we are considering a “Volume 2” perhaps with a title like “Experts Only.” If you’ve read this book, you’ll be plenty expert enough. We hope to dive into some topics more deeply and discuss additional opportunities. Look for something by late 2013.

Feedback appreciated

We would enjoy receiving feedback. We encourage you to join us at http://lotsofcores.com to find out how to direct feedback to us most effectively. Thank you, in advance, for anything you share with us!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset