Intelligent resource allocation and power control schemes are regarded as important methods to alleviate the problems caused by the sharp increase in the number of users and operating costs. In this paper, we propose a multi-agent deep reinforcement learning (MADRL)-based algorithm to jointly optimize resource block (RB) allocation and power control, which aims to maximize the average spectrum efficiency (SE) of the system while meeting quality of service (QoS) constraints. In view of the fact that centralized training distributed execution retains the advantages of centralized training while reducing the amount of computation and signaling overhead, the MADRL technique can be adopted. In the proposed MADRL model, the Q function of each agent is aggregated through the value decomposition network, which strengthens the cooperation of agents and improves the convergence of the algorithm. We add a reward discount network into the original MADRL framework to adaptively adjust the attention to future rewards according to the performance of agents in the training process. Simulation experiments show that the proposed algorithm has better performance and stability than the existing alternatives.
Pengxiang HuYinxiang ZhangXiao YanXiaofeng Tao
Wenchao WuSige LiuChenguang LuToktam MahmoodiA.H. AghvamiYansha Deng
Liying LiGang WuHongbing XuGeoffrey Ye LiXin Feng
Trinh Van ChienEmil BjörnsonErik G. Larsson
Chien, Trinh VanBjörnson, EmilLarsson, Erik G.