Considering progressively more flexible satellite payloads and increasingly dynamic and heterogeneous traffic requirements, it is essential to allocate the limited and multidimensional communication resources efficiently. In order to match system throughput and users' demands, an algorithm based on multi-agent reinforcement learning for multi-beam satellite systems is proposed in this paper, which takes two degrees of freedom, power and frequency, into account. To that end, a downlink model is established firstly. Then a joint frequency assignment and power allocation optimization problem is formulated and a multi-agent deep reinforcement learning algorithm using centralized training and distributed execution is presented, where each beam is modeled as one agent to avoid the curse of dimensionality in the action space. Moreover, a converting multi-discrete actions to discrete (CMD) trick is raised to handle the hierarchical multi-discrete actions. Simulation results show that the proposed approach has better performance compared to the existing algorithms and can achieve real-time frequency assignment and power allocation intelligently.
Yuanzhi HeBiao ShengHao YinDi YanYingchao Zhang
Zhiyuan LinZuyao NiLinling KuangChunxiao JiangZhen Huang
Zhiyuan LinZuyao NiLinling KuangChunxiao JiangZhen Huang
Shijun MaXin HuXianglai LiaoWeidong Wang