In deep reinforcement learning (DRL), policy gradient (PG) and actor-critic (AC) based methods are among the most populous and effective methods for training DRL agents. One such method is the state-of-the-art deep deterministic policy gradient (DDPG). In this research, we employ the framework of mutual learning with DDPG to present a novel, Mutual DDPG (MuDDPG) agent with the aim to improve the performance and robustness of conventional DDPG. We also propose an additional simple innovation of adaptive reward-based exploration to further improve the rate of learning. We demonstrate that by employing these schemes, MuDDPG can converge faster and perform better than vanilla DDPG in two simple simulated tasks while adding significant robustness to the learning process.
Teckchai TiongIsmail SaadKenneth Tze Kin TeoHerwansyah bin Lago
Dapeng HuXuesong JiangXiumei WeiJian Wang