JOURNAL ARTICLE

Data-Aware Scheduling Strategy for Scientific Workflow Applications in IaaS Cloud Computing

Sid Ahmed MakhloufBelabbas Yagoubi

Year: 2018 Journal:   International Journal of Interactive Multimedia and Artificial Intelligence Vol: 5 (4)Pages: 75-75   Publisher: International University of La Rioja

Abstract

Scientific workflows benefit from the cloud computing paradigm, which offers access to virtual resources provisioned on pay-as-you-go and on-demand basis. Minimizing resources costs to meet user’s budget is very important in a cloud environment. Several optimization approaches have been proposed to improve the performance and the cost of data-intensive scientific Workflow Scheduling (DiSWS) in cloud computing. However, in the literature, the majority of the DiSWS approaches focused on the use of heuristic and metaheuristic as an optimization method. Furthermore, the tasks hierarchy in data-intensive scientific workflows has not been extensively explored in the current literature. Specifically, in this paper, a data-intensive scientific workflow is represented as a hierarchy, which specifies hierarchical relations between workflow tasks, and an approach for data-intensive workflow scheduling applications is proposed. In this approach, first, the datasets and workflow tasks are modeled as a conditional probability matrix (CPM). Second, several data transformation and hierarchical clustering are applied to the CPM structure to determine the minimum number of virtual machines needed for the workflow execution. In this approach, the hierarchical clustering is done with respect to the budget imposed by the user. After data transformation and hierarchical clustering, the amount of data transmitted between clusters can be reduced, which can improve cost and makespan of the workflow by optimizing the use of virtual resources and network bandwidth. The performance and cost are analyzed using an extension of Cloudsim simulation tool and compared with existing multi-objective approaches. The results demonstrate that our approach reduces resources cost with respect to the user budgets.

Keywords:
Computer science Workflow Cloud computing Distributed computing Workflow management system Workflow technology Workflow engine Scheduling (production processes) Provisioning Cluster analysis Virtual machine Data mining Database Machine learning Computer network Operating system Mathematical optimization

Metrics

9
Cited By
4.02
FWCI (Field Weighted Citation Impact)
49
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management
Physical Sciences →  Computer Science →  Information Systems
Distributed and Parallel Computing Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications
Scientific Computing and Data Management
Social Sciences →  Decision Sciences →  Information Systems and Management
© 2026 ScienceGate Book Chapters — All rights reserved.