Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Juntao Dai; Jiaming Ji; Yang Long; Qian Zheng; Gang Pan

doi:10.1609/aaai.v37i6.25888

ScienceGate Book Chapters

JOURNAL ARTICLE

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Juntao Dai Jiaming Ji Yang Long Qian Zheng Gang Pan

Year: 2023 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 37 (6)Pages: 7288-7295 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v37i6.25888

Get Full-Text PDF Get Analytical Report

Abstract

Safe reinforcement learning considers practical scenarios that maximize the return while satisfying safety constraints. Current algorithms, which suffer from training oscillations or approximation errors, still struggle to update the policy efficiently with precise constraint satisfaction. In this article, we propose Augmented Proximal Policy Optimization (APPO), which augments the Lagrangian function of the primal constrained problem via attaching a quadratic deviation term. The constructed multiplier-penalty function dampens cost oscillation for stable convergence while being equivalent to the primal constrained problem to precisely control safety costs. APPO alternately updates the policy and the Lagrangian multiplier via solving the constructed augmented primal-dual problem, which can be easily implemented by any first-order optimizer. We apply our APPO methods in diverse safety-constrained tasks, setting a new state of the art compared with a comprehensive list of safe RL baselines. Extensive experiments verify the merits of our method in easy implementation, stable convergence, and precise cost control.

Keywords:

Augmented Lagrangian method Reinforcement learning Mathematical optimization Computer science Convergence (economics) Multiplier (economics) Lagrange multiplier Penalty method Quadratic equation Sequential quadratic programming Lagrangian relaxation Constraint (computer-aided design) Dual (grammatical number) Quadratic programming Mathematics Artificial intelligence

Metrics

Cited By

1.73

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Multi-Objective Optimization Algorithms

Physical Sciences → Computer Science → Computational Theory and Mathematics

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Coupled Penalties-Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Convergent Policy Optimization for Safe Reinforcement Learning