Zhuang Wei, J.M. Qu, et al.
HPCC-ICESS-CSS 2015
With the prevalence of big data, MapReduce has emerged as the most widely deployed computing framework for data analysts. This paper addresses MapReduce job performance optimization, targeting system latency reduction. We design a systematic method to optimize MapReduce job execution process by maximizing the utilization of computing resources. Through careful analysis of the mechanism behind Hadoop, the map-shuffle-reduce work-flow is formalized based on the resource supply-demand relations. Efficient and effective algorithms are developed to address the optimization using mixed integer nonlinear programming. Experiments on a ten-node cluster demonstrate that the proposed model achieves consistently improved performance, and significantly outperforms the system with default parameter setting.
Zhuang Wei, J.M. Qu, et al.
HPCC-ICESS-CSS 2015
K. Warren, R. Ambrosio, et al.
IBM J. Res. Dev
Vikram Sharma Mailthody, Ketan Date, et al.
HPEC 2018
Ruchir Puri, Leon Stok, et al.
DAC 2003