Scientific workflow usually needs to be performed in multiple collaborative datacenters for the requirement of accessing community-wide resources. However, the movements of initial input data and intermediate data across geo-distributed datacenters would hinder efficient execution of large-scale dataintensive scientific workflows. In this paper, a novel scheduling approach based on graph partition is proposed for the execution of data-intensive scientific workflow in geo-distributed datacenters, aiming at the optimization of the overall data transfer cost. Simulations show that our algorithm significantly reduces the overall geo-distributed data transfer and demonstrate its effectiveness.
Jinghui ZhangJian ChenJun ZhanJiahui JinAibo Song
Linfeng XieYang DaiYongjin ZhuXin LiXiangbo LiZhuzhong Qian
Chien‐Chun HungLeana GolubchikMinlan Yu
Xinyue ShuQuanwang WuMengChu ZhouJunhao Wen