Dongping ZhaoHui LiZiyang WangHang Li
To address the challenges of low efficiency, instability, and difficulties in meeting multiple constraints simultaneously in multi-AGV (Automated Guided Vehicle) task scheduling for intelligent manufacturing and logistics, this paper introduces a scheduling method based on multi-feature constraints and an improved deep reinforcement learning (DRL) approach (Improved Proximal Policy Optimization, IPPO). The method integrates multiple constraints, including minimizing task completion time, reducing penalty levels, and minimizing scheduling time deviation, into the scheduling optimization process. Building on the conventional PPO algorithm, several enhancements are introduced: a dynamic penalty mechanism is implemented to adaptively adjust constraint weights, a structured reward function is designed to boost learning efficiency, and sampling bias correction is combined with global state awareness to improve training stability and global coordination. Simulation experiments demonstrate that, after 10,000 iterations, the minimum task completion time drops from 98.2 s to 30 s, the penalty level decreases from 130 to 82, and scheduling time deviation reduces from 12 s to 0.5 s, representing improvements of 69.4%, 37%, and 95.8% in the same scenario, respectively. Compared to genetic algorithms (GAs) and rule-based scheduling methods, the IPPO approach demonstrates significant advantages in average task completion time, total system makespan, and overall throughput, along with faster convergence and better stability. These findings demonstrate that the proposed methodology enables effective multi-objective collaborative optimization and efficient task scheduling within complex dynamic environments, holding significant value for intelligent manufacturing and logistics systems.
Yuxin ZhaoKe ZhuXueming SongJianming Zhang
Jens PopperVassilios YfantisMartin Ruskowski
Zuozhong YinJihong LiuDianpeng Wang