Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Wubing Chen; Wenbin Li; Xiao Liu; Shangdong Yang; Yang Gao

doi:10.1609/aaai.v37i10.26364

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Wubing Chen Wenbin Li Xiao Liu Shangdong Yang Yang Gao

Year: 2023 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (10)Pages: 11542-11550 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v37i10.26364

Get Full-Text PDF Get Analytical Report

Abstract

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

Keywords:

Reinforcement learning Computer science Mathematical optimization Potential game Polarization (electrochemistry) Distributed computing Artificial intelligence Mathematics Nash equilibrium

Metrics

Cited By

0.29

FWCI (Field Weighted Citation Impact)

Refs

0.38

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Abstract

Metrics

Citation History

Topics

Related Documents

Multi-Level Credit Assignment for Cooperative Multi-Agent Reinforcement Learning

Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch

Credit assignment in heterogeneous multi-agent reinforcement learning for fully cooperative tasks

Asynchronous Credit Assignment for Multi-Agent Reinforcement Learning

Asynchronous Credit Assignment for Multi-Agent Reinforcement Learning