Spark的原生调度策略建立在集群同质化的基本假设上。然而随着硬件的更迭以及高性能硬件的引入,集群异质化现象日趋显著。因此现有的调度策略在异构集群环境下并不高效,短板效应严重。针对这个问题,本文提出了一种新的调度策略以优化Spark在异构集群下的表现。新策略引入了分层调度的思想,调度时综合考量了任务复杂度、节点性能及节点资源使用情况等因素,实现了更加高效公平的任务调度算法。通过仿真和真机实验,证明了新策略的效果相对于原策略有明显提升。 The scheduling strategy of Spark assumes that cluster is homogenized. However, as the change or update of hardware in cluster, it becomes more and more heterogeneous. Thus, the original scheduling strategy cannot meet the performance requirement anymore and short board effect gradually emerges. The paper proposes a new strategy to solve this problem. The new strategy refers the idea of hierarchical scheduling. It combines the task complexity, worker performance and worker CPU usage as its scheduling factors to improve the scheduling performance. And ex-periments show that the new strategy is absolutely effective.
YANG Zhiwei,ZHENG Quan,WANG Song,YANG Jian,ZHOU Lele
Yu LiangYu TangXun ZhuXiaoyuan GuoChenyao WuDi Lin
Xiaoyong TangJun XieWenzheng Liu