In this paper, we propose using Multi-agent Reinforcement Learning (MARL) for distributed resource allocation in 5G networks. We consider the case where the resource allocation is performed by each User Equipment (UE). The goal will be to learn a joint policy that can be executed by the UEs in a distributed manner. Such policy can achieve a minimum data rate for each user and maximize the sum rate of the users in the network. We consider two different MARL paradigms, namely, Independent Learners (ILs) and Value Function Factorization (VFF). In the latter, we adopt the QTRAN algorithm, which is a value function decomposition-based algorithm that is categorized under the Centralized Training with Distributed Execution (CTDE) regime. Results show that MARL algorithms can be used to learn a joint policy that can be used by UEs for distributed resource allocation.
Jingjing CuiYuanwei LiuArumugam Nallanathan
Nawaf Qasem Hamood OthmanJinglei LiQinghai Yang