Hao ZhangYan LiZhuping WangYi DingHuaicheng Yan
Dear Editor, In this letter, the multi-objective optimal control problem of nonlinear discrete-time systems is investigated. A data-driven policy gradient algorithm is proposed in which the action-state value function is used to evaluate the policy. In the policy improvement process, the policy gradient based method is employed, which can improve the performance of the system and finally derive the optimal policy in the Pareto sense. The actor-critic structure is established to implement the algorithm. In order to improve the efficiency of data usage and enhance the learning effect, the experience replay technology is used during the training process, with both offline data and online data. Finally, simulation is given to illustrate the effectiveness of the method.
Yongwei ZhangBo ZhaoDerong Liu
Biao LuoDerong LiuHuai‐Ning WuDing WangFrank L. Lewis
Lianghao JiKai JianCuijuan ZhangShasha YangXing GuoHuaqing Li
Mingduo LinBo ZhaoDerong LiuYongwei Zhang
Ruyue YangDing WangJunfei Qiao