| Previous | [ 1] | [ 2] | [ 3] | [ 4] | [ 5] | [ 6] | [ 7] | [ 8] | [ 9] | [ 10] | [ 11] | [ 12] | [ 13] | [ 14] | [ 15] | [ 16] | [ 17] | [ 18] | [ 19] | [ 20] | [ 21] | [ 22] | [ 23] | [ 24] | [ 25] |
¡@
SHIN-GYU KIM, HYUCK HAN, HYUNGSOO JUNG+,
HYEONSANG EOM AND HEON Y. YEOM
School of Computer Science and Engineering
Seoul National University
Seoul, 151-744, Korea
+School of Information Technologies
University of Sydney
NSW 2006, Australia
The proliferation of data parallel programming on large clusters has set a new research
avenue: accommodating numerous types of data-intensive applications with a feasible
plan. Behind the many research efforts, we can observe that there exists a nontrivial
amount of redundant I/O in the execution of data-intensive applications. This redundancy
problem arises as an emerging issue in the recent literature because even the locality-
aware scheduling policy in a MapReduce framework is not effective in a cluster environment
where storage nodes cannot provide a computation service. In this article, we
introduce SplitCache for improving the performance of data-intensive OLAP-style applications
by reducing redundant I/O in a MapReduce framework. The key strategy to
achieve the goal is to eliminate such I/O redundancy especially when different applications
read common input data within an overlapped time period; SplitCache caches the
first input stream in the computing nodes and reuses them for future demands. We also
design a cache-aware task scheduler that plays an important role in achieving high cache
utilization. In execution of the TPC-H benchmark, we achieved 64.3% faster execution
and 83.48% reduction in network traffic in average.
Received November 30, 2009; revised February 26 & July 1, 2010; accepted August 5, 2010.
Communicated by Chung-Ta King.
* This work was supported by the National Research Foundation (NRF) grant funded by the Korea government
(MEST) (No. 2010-0014387). The ICT at Seoul National University provided research facilities for this study.
+ Corresponding author.