The global value creation networks have experienced increased volatility and dynamic behavior in recent years, resulting in an acceleration of a trend already evident in the shortening of product and technology cycles.In addition, the manufacturing industry is demonstrating a trend of allowing customers to make specific adjustments to their products at the time of ordering.Not only do these changes require a high level of flexibility and adaptability from the cyber-physical systems, but also from the employees and the supervisory production planning.As a result, the development of control and monitoring mechanisms becomes more complex.It is also necessary to adjust the production process dynamically if there are unforeseen events (disrupted supply chains, machine breakdowns, or absences of staff) in order to make the most effective and efficient use of the available production resources.In recent years, reinforcement learning (RL) research has gained increasing popularity in strategic planning as a result of its ability to handle uncertainty in dynamic environments in real time.RL has been extended to include multiple agents cooperating on complex tasks as a solution to complex problems.Despite its potential, the real-world application of multi-agent reinforcement learning (MARL) to manufacturing problems, such as flexible job-shop scheduling, has been less frequently approached.The main reason for this is most of the applications in this field are frequently subject to specific requirements as well as confidentiality obligations.Due to this, it is difficult for the research community to obtain access to them, which presents substantial challenges for the implementation of these tools.This paper focuses on the application and comparison of single-agent RL as well as MARL algorithms for solving the problem of dynamic scheduling in the form of an intelligent resource allocation problem using a model factory as an example, where the objective is to reduce the makespan of given jobs.To reduce the entry barriers for other researchers and to ensure reproducibility, a simulation environment is provided to the research community, which was also used to perform the experiments of this study.By including redundant operations, variations in order compositions, product variants, setup times, and an automated material transportation tools, a realistic approach is made possible.Moreover, the (composite of) intelligent dispatcher(s) is faced with variations in operation times and breakdowns.Further, this study investigates the convergence behavior of trained RL and MARL models, as well as their performance in handling unknown and unforeseen scenarios compared to heuristic approaches.The experiments demonstrate that even under significant time constraints, RL, especially under the multi-agent setting with Proximal Policy Optimization (PPO) algorithm at the core, is able to outperform conventional heuristic methods when dealing with the complex problem of production scheduling under uncertainty.
Mingjie BiIlya KovalenkoDawn M. TilburyKira Barton