Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient

Ming Zhan; Jingjing Fan; Jianying Guo

doi:10.1109/access.2023.3305453

ScienceGate Book Chapters

JOURNAL ARTICLE

Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient

Ming Zhan Jingjing Fan Jianying Guo

Year: 2023 Journal: IEEE Access Vol: 11 Pages: 87732-87746 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2023.3305453

Get Full-Text PDF Get Analytical Report

Abstract

Although the issue of sparse expert samples at the early stage of training in inverse reinforcement learning (IRL) is successfully resolved by the introduction of generative adversarial network (GAN), the inherent drawbacks of GAN result in ineffective generated samples. Therefore, we propose an algorithm for generative adversarial inverse reinforcement learning that is based on deep deterministic policy gradient (DDPG). We use the deterministic strategy to replace the random noise input of the initial GAN model and reconstruct the generator of the GAN based on the Actor-Critic mechanism in order to improve the quality of GAN-generated samples during adversarial training. Meanwhile, we mix the GAN-generated virtual samples with the original expert samples of IRL as the expert sample set of IRL. Our approach not only solves the problem of sparse expert samples at the early stage of training, but most importantly, it makes the decision-making process of IRL occurring under GAN more efficient. In the subsequent IRL decision-making process, we also analyze the differences between the mixed expert samples and the non-expert trajectory samples generated by the initial strategy to determine the best reward function. The learned reward function is used to drive the RL process positively for policy updating and optimization, on which further non-expert trajectory samples are generated. By comparing the differences between the new non-expert samples and the mixed expert sample set, we hope to iteratively arrive at the reward function and optimal policy. Performance tests in the MuJoCo physical simulation environment and trajectory prediction experiments in Grid World show that our model improves the quality of GAN-generated samples and reduces the computational cost of the network training by approximately 20% for each given environment, applying to decision planning for autonomous driving.

Keywords:

Adversarial system Reinforcement learning Computer science Generative grammar Artificial intelligence Inverse Inverse problem Mathematics

Metrics

Cited By

0.77

FWCI (Field Weighted Citation Impact)

Refs

0.72

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Anomaly Detection Techniques and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Autonomous Vehicle Technology and Safety

Physical Sciences → Engineering → Automotive Engineering

Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient

Abstract

Metrics

Citation History

Topics

Related Documents

Reinforcement Learning with Deep Deterministic Policy Gradient

Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient

Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm

UAV Autonomous Aerial Combat Maneuver Strategy Generation with Observation Error Based on State-Adversarial Deep Deterministic Policy Gradient and Inverse Reinforcement Learning

Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient