Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

Zhiyuan Lin; Zuyao Ni; Linling Kuang; Chunxiao Jiang; Zhen Huang

doi:10.1109/tvt.2022.3145848

ScienceGate Book Chapters

JOURNAL ARTICLE

Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

Zhiyuan Lin Zuyao Ni Linling Kuang Chunxiao Jiang Zhen Huang

Year: 2022 Journal: IEEE Transactions on Vehicular Technology Vol: 71 (4)Pages: 3917-3930 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tvt.2022.3145848

Get Full-Text PDF Get Analytical Report

Abstract

Due to the non-uniform geographic distribution and time-varying characteristics of the ground traffic request, how to make full use of the limited beam resources to serve users flexibly and efficiently is a brand-new challenge for beam hopping satellite systems. The conventional greedy-based beam hopping methods do not consider the long-term reward, which is difficult to deal with the time-varying traffic demand. Meanwhile, the heuristic algorithms such as genetic algorithm have a slow convergence time, which can not achieve real-time scheduling. Furthermore, existing methods based on deep reinforcement learning (DRL) only make decisions on beam patterns, lack of the freedom of bandwidth. This paper proposes a dynamic beam pattern and bandwidth allocation scheme based on DRL, which flexibly uses three degrees of freedom of time, space and frequency. Considering that the joint allocation of bandwidth and beam pattern will lead to an explosion of action space, a cooperative multi-agents deep reinforcement learning (MADRL) framework is presented in this paper, where each agent is only responsible for the illumination allocation or bandwidth allocation of one beam. The agents can learn to collaborate by sharing the same reward to achieve the common goal, which refers to maximize the throughput and minimize the delay fairness between cells. Simulation results demonstrate that the offline trained MADRL model can achieve real-time beam pattern and bandwidth allocation to match the non-uniform and time-varying traffic request. Furthermore, when the traffic demand increases, our model has a good generalization ability.

Keywords:

Reinforcement learning Dynamic bandwidth allocation Computer science Bandwidth (computing) Bandwidth allocation Q-learning Channel allocation schemes Communications satellite Scheduling (production processes) Greedy algorithm Beam search Distributed computing Real-time computing Mathematical optimization Artificial intelligence Computer network Satellite Engineering Telecommunications Algorithm Search algorithm

Metrics

141

Cited By

47.01

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Satellite Communication Systems

Physical Sciences → Engineering → Aerospace Engineering

Age of Information Optimization

Physical Sciences → Computer Science → Computer Networks and Communications

Opportunistic and Delay-Tolerant Networks

Physical Sciences → Computer Science → Computer Networks and Communications

Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems

Abstract

Metrics

Citation History

Topics

Related Documents

Dynamic Beam Hopping Resource Allocation Algorithm Based on Deep Reinforcement Learning in Multi-Beam Satellite Systems

Deep Reinforcement Learning for Dynamic Bandwidth Allocation in Multi-Beam Satellite Systems

Deep Reinforcement Learning Based Interference Avoidance Beam-Hopping Allocation Algorithm in Multi-beam Satellite Systems

Satellite-Terrestrial Coordinated Multi-Satellite Beam Hopping Scheduling Based on Multi-Agent Deep Reinforcement Learning

Multi-Satellite Beam Hopping Based on Deep Reinforcement Learning for LEO Satellite Systems