YANG Zhiwei,ZHENG Quan,WANG Song,YANG Jian,ZHOU Lele
Spark is a kind of efficient big data processing platform based on memory and similar to Hadoop MapReduce.But the Spark default task scheduling strategy does not take the different capacity of node into account for heterogeneous Spark cluster,thus leading to the low system performance.For this problem,this paper presents an adaptive task scheduling strategy for heterogeneous Spark cluster,which analyzes parameters from surveillance to dynamically adjust the task allocation weights of nodes through monitoring the load and resource utilization of nodes.Experimental result validates that this strategy for heterogeneous nodes is superior to the default task scheduling strategy in aspects like task completion time,nodes working state and resource utilization.
Yu LiangYu TangXun ZhuXiaoyuan GuoChenyao WuDi Lin
Xiaoyong TangJun XieWenzheng Liu
Cristina BoeresJosé ViterboVinod E. F. Rebello