JOURNAL ARTICLE

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Linrui ZhangLi ShenLong YangShixiang ChenXueqian WangBo YuanDacheng Tao

Year: 2022 Journal:   Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence Pages: 3744-3750

Abstract

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple yet effective penalty approach to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the penalized method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

Keywords:
Reinforcement learning Constraint (computer-aided design) Constraint satisfaction Mathematical optimization Computer science Set (abstract data type) Constraint satisfaction problem Penalty method Trust region Minification Constrained optimization Optimization problem Constrained optimization problem Artificial intelligence Mathematics

Metrics

56
Cited By
6.47
FWCI (Field Weighted Citation Impact)
22
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Autonomous Vehicle Technology and Safety
Physical Sciences →  Engineering →  Automotive Engineering
Robotic Path Planning Algorithms
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Juntao DaiJiaming JiYang LongQian ZhengGang Pan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2023 Vol: 37 (6)Pages: 7288-7295
JOURNAL ARTICLE

Coupled Penalties-Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Ning PangLongyang HuangWeidong Zhang

Journal:   Journal of Physics Conference Series Year: 2025 Vol: 3077 (1)Pages: 012002-012002
JOURNAL ARTICLE

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Saurugger, Bernd

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Policy Optimization in Reinforcement Learning: Proximal Policy Optimization

Saurugger, Bernd

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2023
JOURNAL ARTICLE

Convergent Policy Optimization for Safe Reinforcement Learning

Ming YuZhuoran YangMladen KolarZhaoran Wang

Journal:   arXiv (Cornell University) Year: 2019 Vol: 32 Pages: 3121-3133
© 2026 ScienceGate Book Chapters — All rights reserved.