Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems

Jiahui Xu; Jingcheng Wang; Jun Rao; Yanjiu Zhong; Shunyu Wu; Qifang Sun

doi:10.1109/tsmc.2024.3373456

ScienceGate Book Chapters

JOURNAL ARTICLE

Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems

Jiahui Xu Jingcheng Wang Jun Rao Yanjiu Zhong Shunyu Wu Qifang Sun

Year: 2024 Journal: IEEE Transactions on Systems Man and Cybernetics Systems Vol: 54 (6)Pages: 3809-3821 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tsmc.2024.3373456

Get Full-Text PDF Get Analytical Report

Abstract

Policy gradient adaptive dynamic programming (PGADP) is a recently acclaimed control technique for the optimal control design of nonlinear systems. Nevertheless, it demands a substantial amount of interaction data with the controlled system, which can prove costly or perilous in certain scenarios. This article introduces a parallel cross entropy optimization method-based PGADP (PCEOM-PGADP) algorithm, with the objective of devising an optimal tracking controller for discrete-time nonlinear systems. The tracking problem is transformed into a regulation problem by constructing a tracking error system. Furthermore, the implementation of the proposed algorithm employs an actor–critic structure, where the actor network represents the control policy and the critic network assesses its performance. Through the iterative interaction, the optimal policy is ultimately derived. The approach also leverages the parallel cross entropy optimization method (PCEOM) to acquire a reasonable initial control policy for PGADP, thereby accelerating the efficiency of the learning process. Convergence analysis of the algorithm is conducted by demonstrating that the generated $Q$ function constitutes a monotonically nonincreasing sequence. Finally, the effectiveness of the proposed PCEOM-PGADP algorithm is verified through simulation on a complex automated driving tracking system.

Keywords:

Dynamic programming Computer science Mathematical optimization Optimal control Monotonic function Nonlinear system Discrete time and continuous time Entropy (arrow of time) Convergence (economics) Tracking error Control theory (sociology) Mathematics Algorithm Control (management) Artificial intelligence

Metrics

Cited By

1.58

FWCI (Field Weighted Citation Impact)

Refs

0.73

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adaptive Dynamic Programming Control

Physical Sciences → Computer Science → Computational Theory and Mathematics

Mechanical Circulatory Support Devices

Physical Sciences → Engineering → Biomedical Engineering

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Parallel Cross Entropy Policy Gradient Adaptive Dynamic Programming for Optimal Tracking Control of Discrete-Time Nonlinear Systems

Abstract

Metrics

Citation History

Topics

Related Documents

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Policy Optimization Adaptive Dynamic Programming for Optimal Control of Input-Affine Discrete-Time Nonlinear Systems

Bias-Policy Iteration-Based Adaptive Dynamic Programming for Optimal Control of Discrete-Time Nonlinear Systems

Adaptive dynamic programming for optimal control of unknown nonlinear discrete-time systems

Data-based Optimal Control for Discrete-time Systems via Deep Deterministic Policy Gradient Adaptive Dynamic Programming