Multi-agent reinforcement learning (MARL) has shown promising results in many challenging sequential decision-making tasks. Recently, deep neural networks have dominated this field. However, the policy networks of agent's may fall into local optimum during the training phase, which severely constrains the performance of exploration. To address this issue, we propose a novel MARL learning framework named PSAM, which contains a new temporal inconsistency-based intrinsic reward and a diversity control strategy. Specifically, we save the parameters of the deep models along the optimization path of the agent's policy network, which can be denoted as snapshots. Through measuring the difference between snapshots, we can employ the difference as an intrinsic reward. Moreover, we propose a diversity control strategy to improve the performance further. Finally, to verify the effectiveness of the proposed method, we conduct extensive experiments in several widely used MARL environments. The results show that in many environments, PSAM can not only achieve state-of-the-art performance and prevent the policy network from getting stuck in local minima but also accelerate the agent's learning of the policy. It is worth noting that the proposed regularizer can be used using a plug-and-play manner without introducing any additional hyper-parameters and training costs.
Seokhun JuSeungyub HanTaehyun ChoJungwoo LeeTaeyoung LeeMinkyoung KimJinung An
Zhihao LiuZhiwei XuGuoliang Fan
Zhuo JiangDaiying TianQingkai YangZhihong Peng
Zihan ChenBiao LuoTianmeng HuXiaodong Xu
Tianle ZhangZhen LiuShiguang WuZhiqiang PuJianqiang Yi