The model-free optimal control problem for discrete-time systems is considered in this paper by using deep deterministic policy gradient adaptive dynamic programming (DDPGADP) algorithm. The system data is obtained by using the off-policy learning and the control law is updated by policy gradient. The convergence of DDPGADP algorithm is verified by showing that the Q-function sequence is monotonically non-increasing and converges to the optimum. In order to implement this method, an actor-critic neural network structure is established by adopting the target network technology from deep Q-learning during the neural network training process. Finally, simulation examples are presented to verify the effectiveness of the proposed method.
Jiahui XuJingcheng WangJun RaoYanjiu ZhongShangwei Zhao
Biao LuoDerong LiuHuai‐Ning WuDing WangFrank L. Lewis
Yongwei ZhangBo ZhaoDerong Liu
Jiahui XuJingcheng WangJun RaoYanjiu ZhongShunyu WuQifang Sun
Yongwei ZhangBo ZhaoDerong LiuShunchao Zhang