JOURNAL ARTICLE

An Efficiency-Aware Scheduling for Data-Intensive Computations on MapReduce Clusters

Hui ZhaoShuqiang YangHua FanZhikun ChenJinghu Xu

Year: 2013 Journal:   IEICE Transactions on Information and Systems Vol: E96.D (12)Pages: 2654-2662   Publisher: Institute of Electronics, Information and Communication Engineers

Abstract

Scheduling plays a key role in MapReduce systems. In this paper, we explore the efficiency of an MapReduce cluster running lots of independent and continuously arriving MapReduce jobs. Data locality and load balancing are two important factors to improve computation efficiency in MapReduce systems for data-intensive computations. Traditional cluster scheduling technologies are not well suitable for MapReduce environment, there are some in-used schedulers for the popular open-source Hadoop MapReduce implementation, however, they can not well optimize both factors. Our main objective is to minimize total flowtime of all jobs, given it's a strong NP-hard problem, we adopt some effective heuristics to seek satisfied solution. In this paper, we formalize the scheduling problem as job selection problem, a load balance aware job selection algorithm is proposed, in task level we design a strict data locality tasks scheduling algorithm for map tasks on map machines and a load balance aware scheduling algorithm for reduce tasks on reduce machines. Comprehensive experiments have been conducted to compare our scheduling strategy with well-known Hadoop scheduling strategies. The experimental results validate the efficiency of our proposed scheduling strategy.

Keywords:
Computer science Scheduling (production processes) Heuristics Distributed computing Locality Fair-share scheduling Job scheduler Computation Two-level scheduling Rate-monotonic scheduling Dynamic priority scheduling Load balancing (electrical power) Round-robin scheduling Parallel computing Mathematical optimization Cloud computing Algorithm Operating system Computer network Quality of service

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
22
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Distributed and Parallel Computing Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.