In this paper, we propose a novel decentralized channel resource allocation algorithm for V2V communication based on deep multi-agent reinforcement learning. Each vehicle behaves as an independent agent and uses its local observation to select the optimal resource blocks (RBs) from pre-conflgured resource pool (environment). The selected RB is considered optimal if transmission within this RB causes a minimum interference to other ongoing transmissions. We applied an actor-critic reinforcement learning algorithm to let each agent conduct a centralized training and decentralized execution. In centralized training, agents share their actions and local observations through the centralized critic network. While making a decision (choosing the optimal resource blocks), therefore, each agent can estimate the policies of other agents. In decentralized execution, each agent uses its local observation to optimize its local policy independently. Each action taken by an agent in actor network is judged by its private critic network.
Alperen GündoğanH. Murat GürsuVolker PauliWolfgang Kellerer
Odilbek UrmonovHayotjon AlievHyungWon Kim