JOURNAL ARTICLE

Probabilistic Network-Aware Task Placement for MapReduce Scheduling

Abstract

Maximizing data locality in task scheduling is critical for the performance of MapReduce job execution. Manyexisting works on MapReduce scheduling decide the placementof map and reduce tasks on a coarse granularity of locationsmeasured by located machines and racks. They do not explicitlyconsider the network topology and data transmission cost, whichmay cause task straggling and degrade the job performance. Inorder to improve MapReduce job performance, in this paper, we consider the task placement with the goal of minimizing theoverall data transmission cost for a job execution while balancingthe transmission cost reduction and resource utilization. Wepropose a probabilistic network-aware scheduling algorithm thatselects a task (map task or reduce task) to be scheduled on a givenavailable task slot that leads to the minimum transmission costamong the task candidates, and then schedule the selected taskon the slot with a probability determined by its transmission cost, a lower expected transmission cost leads to a higher probabilityand vice versa. We also propose a method to more accuratelyestimate the intermediate data size based on the progress ofmap tasks, which is needed to calculate the transmission cost ofreduce tasks but is unknown at the time of reduce task scheduling. We implement our probabilistic network-aware schedulingalgorithm on Apache Hadoop and conduct experiments on ahigh-performance computing platform. The experimental resultsshow that our scheduling algorithm outperforms the previousapproaches in terms of job completion time and cluster resource utilization.

Keywords:
Computer science Probabilistic logic Scheduling (production processes) Granularity Distributed computing Data transmission Task analysis Job scheduler Task (project management) Real-time computing Computer network Mathematical optimization Artificial intelligence Operating system Engineering

Metrics

25
Cited By
11.95
FWCI (Field Weighted Citation Impact)
25
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
IoT and Edge/Fog Computing
Physical Sciences →  Computer Science →  Computer Networks and Communications
Stochastic Gradient Optimization Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.