Honglin XiangYang YangGang HeJingfei HuangDazhong He
Device-to-device (D2D) communications are envisioned as a critical technology to support future ubiquitous mobile communications applications. However, the requirements of high mobility and low latency seriously constrain the performance improvement of D2D communications. In this letter, we investigate a deep reinforcement learning (DRL) based scheme of power and resource allocation for maximizing the throughput of D2D users (DUEs) and cellular users (CUEs). Based on DRL theory, D2D pairs are defined as distributed multiple agents, which aim to adaptively select the transmission power and resources to ease the co-channel interference without any prior information. Furthermore, a priority sampling based dueling double deep Q-network (PS-D3QN) distributed algorithm is proposed to help agents to learn the predominant features. Simulation results show that the proposed algorithm has higher throughput than the existing DRL algorithms with a strict delay constraint. Particularly, the probability that the agent selects high power decreases with the increase of remaining transmission time, which proves the agents effectively learn and dynamically sense the impacts of the delay constraint.
Honglin XiangJingyi PengZhen GaoLingjie LiYang Yang
Y. J. SongXiaoshuai LiXiaoping JiangJunan YangHui Liu