We study the joint decoupled uplink (UL)-downlink (DL) association and trajectory design problem for full-duplex multi-UAV networks. A joint optimization problem is formulated aiming to maximize the sum-rate of user equipments (UEs) in both UL and DL. Since the formulated problem is non-convex and with sophisticated states, a multi-agent deep reinforcement learning (MADRL) approach is employed for enabling each agent (i.e., UAV) to select policy in a distributed manner. Moreover, in order to obtain the optimal policy, a clip-and-count based proximal policy optimization (PPO) algorithm is proposed to train actor-critic neural networks. In particular, a modified clip distribution is designed to deal with the hard restrictions between current and old policies, and an intrinsic reward is introduced to enhance the exploration capability. Simulation results demonstrate the significant performance improvement of our proposed schemes when compared to the benchmarks.
Rui TangRuizhi ZhangYongjun XuChau Yuen
Feng XiaoRu GaoFujie TangJianjun Luo