Jiajia ChenBingqing ZhuMengyu ZhangXiang LingXiaobo RuanYifan DengNing Guo
This study presents the first investigation into the problem of autonomous vehicle (AV) merging into existing platoons, proposing a multi-agent deep reinforcement learning (MA-DRL)-based cooperative control framework. The developed MA-DRL architecture enables coordinated learning among multiple autonomous agents to address the multi-objective coordination challenge through synchronized control of platoon longitudinal acceleration, AV steering and acceleration. To enhance training efficiency, we develop a dual-layer multi-agent maximum Q-value proximal policy optimization (MAMQPPO) method, which extends the multi-agent PPO algorithm (a policy gradient method ensuring stable policy updates) by incorporating maximum Q-value action selection for platoon gap control and discrete command generation. This method simplifies the training process by using maximum Q-value action policy optimization to learn platoon gap selection and discrete action commands. Furthermore, a partially decoupled reward function (PD-Reward) is designed to properly guide the behavioral actions of both AVs and platoons while accelerating network convergence. Comprehensive highway simulation experiments show the proposed method reduces merging time by 37.69% (12.4 s vs. 19.9 s) and energy consumption by 58% (3.56 kWh vs. 8.47 kWh) compared to existing methods (the quintic polynomial-based + PID (Proportional–Integral–Differential)).
Guangfei XuBing ChenGuangxian LiXiangkun He
Anye ZhouZejiang WangJoe BeckAdian Cook
Quan GanBin LiZhoubing XiongXijun LiYanyue Liu
Jayesh K. GuptaMaxim EgorovMykel J. Kochenderfer