Jiaxin GaoYao LyuWenxuan WangYuming YinFei MaShengbo Eben Li
Abstract Distributed stochastic gradient descent techniques have gained significant attention in recent years as a prevalent approach for reinforcement learning. Current distributed learning predominantly employs synchronous or asynchronous training strategies. While the asynchronous scheme avoids idle computing resources present in synchronous methods, it grapples with the stale gradient issue. This paper introduces a novel gradient correction algorithm aimed at alleviating the stale gradient problem. By leveraging second-order information within the worker node and incorporating current parameters from both the worker and server nodes, the gradient correction algorithm yields a refined gradient closer to the desired value. Initially, we outline the challenges associated with asynchronous update schemes and derive a gradient correction algorithm employing local second-order approximations. Subsequently, we propose an asynchronous training scheme incorporating gradient correction within the generalized policy iteration framework. Lastly, in the context of trajectory tracking tasks, we compare the impact of employing gradient correction versus its absence in an asynchronous update scheme. Simulation results underscore the superiority of our proposed training scheme, demonstrating notably faster convergence and higher policy performance compared to the existing asynchronous update methods.
Janis KeuperFranz‐Josef Pfreundt
Youssef AhmedArnob GhoshChih-Chun WangNess B. Shroff