Scientific workflows are widely used in High Performance Computing environments to carry out complex sets of interrelated calculations. As executing tasks in a workflow in the form of individual jobs substantially increase the makespan of the workflow, tasks are grouped into several clusters before executing. Task clustering helps to minimize the makespan of workflows by decreasing overheads in job execution as less number of jobs are required to be executed when tasks are clustered. However, clustering tasks with the sole intension of reducing the makespan of the workflow can lead to serious resource underutilization if tasks having significantly different resource requirements are clustered together. Therefore, it is critical to identify the trade-off between the makespan and the resource utilization to ensure task clustering reduces the makespan of a workflow while maximizing its resource utilization. This paper introduces a new task clustering algorithm called Resource Aware Clustering algorithm which uses a novel metric called Resource Aware Clustering Coefficient that aims to maximize both the makespan improvement and the resource utilization of a workflow. The proposed algorithm improves the resource utilization of the considered workflows by more than 34% compared to other baseline task clustering algorithms while having a competitive makespan improvement compared to their unclustered workflows.
Jyoti SahniDeo Prakash Vidyarthi
Priya KumariAvinash KaurParminder SinghManpreet Singh
A. Stephen McGoughMatthew Forshaw
Kefeng DengKaijun RenJunqiang SongDong YuanYang XiangJinjun Chen
Fengyu GuoLong YuShengwei TianJiong YuHua Sun