Collaborative Intelligent Reflecting Surface Networks With Multi-Agent Reinforcement Learning

Jie Zhang; Jun Li; Yijin Zhang; Qingqing Wu; Xiongwei Wu; Feng Shu; Shi Jin; Wen Chen

doi:10.1109/jstsp.2022.3162109

ScienceGate Book Chapters

JOURNAL ARTICLE

Collaborative Intelligent Reflecting Surface Networks With Multi-Agent Reinforcement Learning

Jie Zhang Jun Li Yijin Zhang Qingqing Wu Xiongwei Wu Feng Shu Shi Jin Wen Chen

Year: 2022 Journal: IEEE Journal of Selected Topics in Signal Processing Vol: 16 (3)Pages: 532-545 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/jstsp.2022.3162109

Get Full-Text PDF Get Analytical Report

Abstract

Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks. In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting. Aiming to maximize the long-term average achievable system rate, an optimization problem is formulated by jointly designing the transmit beamforming at the base station (BS) and discrete phase shift beamforming at the IRSs, with the constraints on transmit power, user data rate requirement and IRS energy buffer size. Considering time-varying channels and stochastic arrivals of energy harvested by the IRSs, we first formulate the problem as a Markov decision process (MDP) and then develop a novel multi-agent Q-mix (MAQ) framework with two layers to decouple the optimization parameters. The higher layer is for optimizing phase shift resolutions, and the lower one is for phase shift beamforming and power allocation. Since the phase shift optimization is an integer programming problem with a large-scale action space, we improve MAQ by incorporating the Wolpertinger method, namely, MAQ-WP algorithm to achieve a sub-optimality with reduced dimensions of action space. In addition, as MAQ-WP is still of high complexity to achieve good performance, we propose a policy gradient-based MAQ algorithm, namely, MAQ-PG, by mapping the discrete phase shift actions into a continuous space at the cost of a slight performance loss. Simulation results demonstrate that the proposed MAQ-WP and MAQ-PG algorithms can converge faster and achieve data rate improvements of 10.7% and 8.8% over the conventional multi-agent DDPG, respectively.

Keywords:

Beamforming Computer science Markov decision process Mathematical optimization Reinforcement learning Optimization problem Base station Transmitter power output Wireless Markov process Algorithm Channel (broadcasting) Mathematics Telecommunications Artificial intelligence Transmitter

Metrics

Cited By

3.12

FWCI (Field Weighted Citation Impact)

Refs

0.90

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Advanced Wireless Communication Technologies

Physical Sciences → Engineering → Electrical and Electronic Engineering

Optical Wireless Communication Technologies

Physical Sciences → Engineering → Electrical and Electronic Engineering

UAV Applications and Optimization

Physical Sciences → Engineering → Aerospace Engineering

Collaborative Intelligent Reflecting Surface Networks With Multi-Agent Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Intelligent reflecting surface-assisted federated learning in multi-platoon collaborative networks

Scaling Collaborative Space Networks with Deep Multi-Agent Reinforcement Learning

When Multi-access Edge Computing Meets Multi-area Intelligent Reflecting Surface: A Multi-agent Reinforcement Learning Approach

Multi-Agent Deep Reinforcement Learning for Intelligent Industrial Iot Networks

Energy efficiency optimization of aerial intelligent reflecting surface-assisted communications based on multi-agent deep reinforcement learning