JOURNAL ARTICLE

Network-aware scheduling of mapreduce framework ondistributed clusters over high speed networks

Abstract

Google's MapReduce has gained significant popularity as a platform for large scale distributed data processing. Hadoop [1] is an open source implementation of MapReduce [11] framework, originally it was developed to operate over single cluster environment and could not be leveraged for distributed data processing across federated clusters. At multiple federated clusters connected with high speed networks, computing resources are provisioned from any of the clusters from the federation. Placement of map tasks close to its data split is critical for performance of Hadoop. In this work, we add network awareness in Hadoop while scheduling the map tasks over federated clusters. We observe 12% to 15 % reduction of execution time in FIFO and FAIR schedulers of Hadoop for varying workloads.

Keywords:
Computer science Scheduling (production processes) Provisioning Distributed computing Distributed database Big data Cluster (spacecraft) Parallel computing Operating system Computer network

Metrics

26
Cited By
8.37
FWCI (Field Weighted Citation Impact)
4
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Data Stream Mining Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
© 2026 ScienceGate Book Chapters — All rights reserved.