Efficient radio resource allocation is a fundamental optimization problem for wireless networks, and has been widely studied in the past. However, wireless systems are evolving to have a much larger parameter space along with richer set of applications and user requirements leading to significant increase in complexity. This paper draws from recent breakthroughs in applying deep reinforcement learning (RL) for control problems containing large state space dimensionality. In particular, a deep RL approach is explored for the problem of allocating time and frequency resources in OFDMA wireless systems to optimize different objective functions using per-station channel quality and traffic information as inputs. Such approaches hold the potential for agents to learn resource allocation and scheduling policies directly from experience rather than using carefully crafted heuristic algorithms based upon models of the environment and stations. Learning directly from experience also means that the policies that result from the online learning should be more robust to imperfect inputs such as noisy, delayed or missing information compared to model-based heuristic approaches. The results in this paper show promise for a deep RL agent using a policy gradient algorithm to learn policies that approach or exceed the performance of well-known model-based approaches such as max-weight and proportional fair policies. The online adaptation algorithms used for the deep RL agent also demonstrate reasonable adaptability and robustness to varying traffic conditions.
Jiaxin LiuXiao MaWeijia HanLiang Wang
Ningcheng YuanWenchen HeJing ShenXuesong QiuShaoyong GuoWenjing Li
Arjola BitiOlimpion ShurdiLuan Ruçi
Kaiwen YuQi HeChuanhang YuXingyu YangGang Wu
Gaoxiang SunXiaoming WangRui JiangYouyun Xu