NYCCS/Computer Science Seminar

Xiaotong Zhuang
Georgia Tech/IBM

Monday, April 26, 2010 2:15PM

Location: Computer Science Building, Room 2311

Code Optimizations in the Heterogeneous Multicore Era

Abstract:

As processor performance starts to plateau due to limiting
factors like power and temperature, heterogeneous multicore
processors are gaining more popularity. In this new era,
we are facing two major challenges. First, extracting parallelism
from complicated applications is still a daunting task, however
we can now afford to trade higher computational cost for better
performance. Second, although specially designed cores offer better
performance for domain specific applications, they also impose
many restrictions that must be addressed properly.

In this presentation, I will first talk about my recent project at
IBM. It aims to parallelize large scientific applications with
complicated control flow and data access patterns targeting future
generations of processors with many cores on die. Due to the dynamic
nature of dependences, such applications are notoriously difficult to
parallelize. To get to the last mile of parallelism, we generate a
strip-down version of the code such that idle cores can be utilized to
compute dependences on the fly, while bounding the worst-case
performance by running a sequential version of the code on
the main thread. This technique has been successfully applied
to several important benchmarks given by the Lawrence Livermore
National Laboratory.

Next, I will present a solution to the context switch dilemma faced by
several domain specific accelerators. Through clever code optimizations,
we were able to share some of the registers across context switches to
greatly reduce the number of memory accesses, achieving a significant
speedup. Part of the optimization was subsequently adopted by
Intel.