Benjamin FreedAditya KapoorIan AbrahamJeff SchneiderHowie Choset
One of the preeminent obstacles to scaling multi-agent reinforcement learning\nto large numbers of agents is assigning credit to individual agents' actions.\nIn this paper, we address this credit assignment problem with an approach that\nwe call \\textit{partial reward decoupling} (PRD), which attempts to decompose\nlarge cooperative multi-agent RL problems into decoupled subproblems involving\nsubsets of agents, thereby simplifying credit assignment. We empirically\ndemonstrate that decomposing the RL problem using PRD in an actor-critic\nalgorithm results in lower variance policy gradient estimates, which improves\ndata efficiency, learning stability, and asymptotic performance across a wide\narray of multi-agent RL tasks, compared to various other actor-critic\napproaches. Additionally, we relate our approach to counterfactual multi-agent\npolicy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show\nthat our approach outperforms COMA by making better use of information in\nagents' reward streams, and by enabling recent advances in advantage estimation\nto be used.\n
Jayant SinghJing ZhouBaltasar Beferull‐LozanoIlya Tyapin
Kun ShaoYuanheng ZhuZhentao TangDongbin Zhao
Leo ArdonDaniel Furelos-BlancoAlessandra Russo
Tianle ZhangZhen LiuShiguang WuZhiqiang PuJianqiang Yi
Giovanni VarricchioneNatasha AlechinaMehdi DastaniBrian Logan