Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

Shaoqi Sun; Kele Xu

doi:10.1109/ijcnn54540.2023.10191420

ScienceGate Book Chapters

JOURNAL ARTICLE

Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

Shaoqi Sun Kele Xu

Year: 2023 Vol: 33 Pages: 1-7

DOI: 10.1109/ijcnn54540.2023.10191420

Get Full-Text PDF Get Analytical Report

Abstract

Multi-agent reinforcement learning (MARL) has shown promising results in many challenging sequential decision-making tasks. Recently, deep neural networks have dominated this field. However, the policy networks of agent's may fall into local optimum during the training phase, which severely constrains the performance of exploration. To address this issue, we propose a novel MARL learning framework named PSAM, which contains a new temporal inconsistency-based intrinsic reward and a diversity control strategy. Specifically, we save the parameters of the deep models along the optimization path of the agent's policy network, which can be denoted as snapshots. Through measuring the difference between snapshots, we can employ the difference as an intrinsic reward. Moreover, we propose a diversity control strategy to improve the performance further. Finally, to verify the effectiveness of the proposed method, we conduct extensive experiments in several widely used MARL environments. The results show that in many environments, PSAM can not only achieve state-of-the-art performance and prevent the policy network from getting stuck in local minima but also accelerate the agent's learning of the policy. It is worth noting that the proposed regularizer can be used using a plug-and-play manner without introducing any additional hyper-parameters and training costs.

Keywords:

Reinforcement learning Computer science Maxima and minima Artificial intelligence Field (mathematics) State (computer science) Artificial neural network Path (computing) Diversity (politics) Control (management) Machine learning Algorithm

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.09

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Autonomous Vehicle Technology and Safety

Physical Sciences → Engineering → Automotive Engineering

Temporal Inconsistency-Based Intrinsic Reward for Multi-Agent Reinforcement Learning

Abstract

Metrics

Topics

Related Documents

Learning graph based individual intrinsic reward for multi-agent reinforcement learning

Hierarchical Multi-Agent Reinforcement Learning with Intrinsic Reward Rectification

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

Intrinsic Reward with Peer Incentives for Cooperative Multi-Agent Reinforcement Learning