Previous [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] [ 8] [ 9] [ 10] [ 11] [ 12] [ 13] [ 14] [ 15] [ 16] [ 17] [ 18] [ 19] [ 20] [ 21] [ 22] [ 23] [ 24] [ 25]


Journal of Information Science and Engineering, Vol. 27 No. 2, pp. 789-804 (March 2011)

Improving MapReduce Performance by Exploiting Input Redundancy*

School of Computer Science and Engineering
Seoul National University
Seoul, 151-744, Korea
+School of Information Technologies
University of Sydney
NSW 2006, Australia

The proliferation of data parallel programming on large clusters has set a new research avenue: accommodating numerous types of data-intensive applications with a feasible plan. Behind the many research efforts, we can observe that there exists a nontrivial amount of redundant I/O in the execution of data-intensive applications. This redundancy problem arises as an emerging issue in the recent literature because even the locality- aware scheduling policy in a MapReduce framework is not effective in a cluster environment where storage nodes cannot provide a computation service. In this article, we introduce SplitCache for improving the performance of data-intensive OLAP-style applications by reducing redundant I/O in a MapReduce framework. The key strategy to achieve the goal is to eliminate such I/O redundancy especially when different applications read common input data within an overlapped time period; SplitCache caches the first input stream in the computing nodes and reuses them for future demands. We also design a cache-aware task scheduler that plays an important role in achieving high cache utilization. In execution of the TPC-H benchmark, we achieved 64.3% faster execution and 83.48% reduction in network traffic in average.

Keywords: mapreduce, I/O redundancy, task scheduling, distributed system, cloud computing

Full Text () Retrieve PDF document (201103_24.pdf)

Received November 30, 2009; revised February 26 & July 1, 2010; accepted August 5, 2010.
Communicated by Chung-Ta King.
* This work was supported by the National Research Foundation (NRF) grant funded by the Korea government (MEST) (No. 2010-0014387). The ICT at Seoul National University provided research facilities for this study.
+ Corresponding author.