We investigate the problem of multi-agent reinforcement learning, in which each agent only has access to its local reward and can only communicate with its nearby neighbors. A distributed algorithm based on actor-critic method has been developed to enable all agents to cooperatively learn a control policy that maximizes the global objective function. Simulations are also provided to validate the proposed algorithm.
Yixuan LinShripad GadeRomeil SandhuJi Liu
Prashant TrivediN. Hemachandra
Miloš S. StankovićMarko BekoNemanja IlićSrdjan Stanković
Wesley A. SuttleZhuoran YangKaiqing ZhangZhaoran WangTamer BaşarJi Liu
Chungui LiMeng WangYuan Qing-neng