Tsung-Chuan Huang, Po-Hsueh Hsu and Tze-Nan Sheng
Department of Electrical Engineering
National Sun Yat-Sen University
Kaohsiung, Taiwan, R.O.C.
For loops with irregular, non-linear, or run-time determined accesses to array elements, many scientific and engineering programs rely on aggressive run-time techniques to exploit the potential parallelism of these loops. In this paper, we propose an efficient run-time technique to find an optimal parallel execution schedule for partially parallel loops in which synchronization between certain iterations is required to ensure correct program semantics. For efficiency, the inspector phase and scheduler phase are combined into a single parallel scheduler. The scheduler partitions loop iterations into several chunks and then schedules the iterations in one chunk into wavefronts concurrently at run-time. Our scheme not only runs much efficiently, but also obtains an optimal schedule. We make use of an atomic bitwise-OR instruction to avoid global synchronization overhead and ensure that the larger wavefront value is preserved when the wavefront variable of an iteration is simultaneously updated during scheduling.
Keywords: run-time parallelization, partially parallel loop, shared-memory multiprocessors, inspector-executor, wavefront
Received May 1, 1997; revised November 20, 1997.
Communicated by Chung-Ta King.