To address the multi-UAV multi-target allocation and cooperative navigation problem in complex obstacle environments, this paper proposes a navigation method based on hierarchical reinforcement learning. Specifically, the cooperative navigation problem is decomposed into two stages: a high-level stage for solving the target allocation problem and a low-level stage for solving the path planning problem. These two stages employ independent learning strategies, and multiple reward functions are designed to optimize control performance for different task objectives (such as obstacle avoidance, path optimization, etc.). In motion control, the maximum entropy reinforcement learning framework is introduced, with the Soft Actor-Critic (SAC) algorithm serving as the base learning algorithm. Finally, the effectiveness of the proposed method is validated through simulation experiments. The experimental results show that the proposed method exhibits good convergence and training efficiency. Additionally, compared to two single-policy learning methods, the hierarchical reinforcement learning approach significantly reduces navigation time while ensuring a high navigation success rate.
Zixiao ZhuLichuan ZhangLu LiuDongwei WuShuchang BaiRanzhen RenWenlong Geng
Kai KouGang YangWenqi ZhangChenyi WangYuan YaoXingshe Zhou
Yunlong DingMinchi KuangHeng ShiJiazhan Gao
Qiyu SunJiaxin JiJinzhen MuJing XuLjupčo KocarevJürgen KurthsYang Tang