Heterogeneous multiprocessors are increasingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However, the state-of-the-art approach is to rely on the programmer to specify this mapping manually and statically. This approach is not only labor intensive but also not adaptable to changes in runtime environments like problem sizes and hardware/software configurations. In this study, we propose Adaptive mapping, a fully automatic technique to map computations to processing elements on a CPU+GPU machine. We have implemented it in our experimental heterogeneous programming system called Qilin. Our results show that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduction in energy consumption than static mappings on average for a set of important computation benchmarks. We also demonstrate that our technique is able to adapt to changes in the input problem size and system configuration.
Chi-Keung (CK) Luk (www.ckluk.org/ck) is a Senior Staff Engineer at Intel, where he conducts research and advanced development in parallel programming, compiler, and program-analysis tools. He is also a Research Affiliate at the Massachusetts Institute of Technology. CK obtained his Ph.D. from the University of Toronto and was a visiting scholar at Carnegie Mellon University. He has over 30 publications and two issued patents with a few others pending. He received an Intel Achievement Award and a nomination for the ACM Doctoral Dissertation Award.