DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning

Guilherme Koslovski; Kleiton Pereira; Paulo Roberto Albuquerque

doi:10.1016/j.future.2023.09.018

ScienceGate Book Chapters

JOURNAL ARTICLE

DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning

Guilherme Koslovski Kleiton Pereira Paulo Roberto Albuquerque

Year: 2023 Journal: Future Generation Computer Systems Vol: 150 Pages: 354-363 Publisher: Elsevier BV

DOI: 10.1016/j.future.2023.09.018

Get Full-Text PDF Get Analytical Report

Abstract

© 2023 Elsevier B.V.High-Performance Computing (HPC) is essential to support the advance in multiple research and industrial fields. Despite the recent growth in processing and networking power, the HPC Data Centers (DCs) are finite, and should be carefully managed to host multiple jobs. The scheduling of tasks (composing a job) is a crucial and complex task, once the reflexes of the scheduler's decisions are perceptible both for users (e.g., slowdown) and for infrastructure administrators (e.g., use of resources and queue length). In fact, the process of scheduling workflows atop a DC can be modeled as a graph mapping problem. While an undirected graph is used to represent the DC, a Directed Acyclic Graph (DAG) is used to express the tasks dependencies. Each vertex and edge from both graphs can have weights associated with them, denoting the residual capacities for DC resources, as well as computing and networking demands for workflows. Motivated by the combinatorial explosion of the aforementioned scheduling problem, the integration of Machine Learning (ML) for generating or improving scheduling policies is a reality, however the proposals in the specialized literature opt, mostly, for using simplified models to reduce the search space or are trained to specific scenarios, which leads to policies that eventually fall short of real DCs expectations. Given this challenge, this work applies Actor–Critic (AC) Reinforcement Learning (RL) to schedule DAG-based workflows. Instead of proposing a new policy, the AC RL is used to select the appropriated scheduling policy from a pool of consolidated algorithms, guided by the DAGs workload and DC usage. The AC RL-based scheduler analyzes the DAGs queue and the DC status to define which algorithms are better suited to improve the overall performance indicators in each scenario instance. The simulation protocol comprises multiple analysis with distinct workload configurations, number of jobs, queue ordering polices and strategies to select the target DC servers. The results demonstrated that the AC RL selects the scheduling policy which fits the current workload and DC status.

Keywords:

Computer science Reinforcement learning Workflow Scheduling (production processes) Distributed computing Directed acyclic graph Job shop scheduling Schedule Artificial intelligence Theoretical computer science Algorithm Mathematical optimization

Metrics

Cited By

16.08

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Cloud Computing and Resource Management

Physical Sciences → Computer Science → Information Systems

Distributed and Parallel Computing Systems

Physical Sciences → Computer Science → Computer Networks and Communications

Software-Defined Networks and 5G

Physical Sciences → Computer Science → Computer Networks and Communications

DAG-based workflows scheduling using Actor–Critic Deep Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Wind Farm Maintenance Scheduling Using Soft Actor-Critic Deep Reinforcement Learning

Actor-Critic Deep Reinforcement Learning for Solving Job Shop Scheduling Problems

Coverage Path Planning Using Actor–Critic Deep Reinforcement Learning

Integrated Actor-Critic for Deep Reinforcement Learning

Deep Reinforcement Learning based Actor-Critic Framework for Decision-Making Actions in Production Scheduling