JOURNAL ARTICLE

Convergent Policy Optimization for Safe Reinforcement Learning

Ming YuZhuoran YangMladen KolarZhaoran Wang

Year: 2019 Journal:   arXiv (Cornell University) Vol: 32 Pages: 3121-3133   Publisher: Cornell University

Abstract

We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions. For such a problem, we construct a sequence of surrogate convex constrained optimization problems by replacing the nonconvex functions locally with convex quadratic functions obtained from policy gradient estimators. We prove that the solutions to these surrogate problems converge to a stationary point of the original nonconvex problem. Furthermore, to extend our theoretical results, we apply our algorithm to examples of optimal control and multi-agent reinforcement learning with safety constraints.

Keywords:
Reinforcement learning Mathematical optimization Optimization problem Trust region Nonlinear programming Convex optimization Computer science Constrained optimization Stationary point Sequence (biology) Constraint (computer-aided design) Mathematics Quadratic programming Regular polygon Nonlinear system Artificial intelligence

Metrics

31
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Adaptive Dynamic Programming Control
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
Distributed Control Multi-Agent Systems
Physical Sciences →  Computer Science →  Computer Networks and Communications

Related Documents

JOURNAL ARTICLE

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Juntao DaiJiaming JiYang LongQian ZhengGang Pan

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2023 Vol: 37 (6)Pages: 7288-7295
JOURNAL ARTICLE

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Linrui ZhangLi ShenLong YangShixiang ChenXueqian WangBo YuanDacheng Tao

Journal:   Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence Year: 2022 Pages: 3744-3750
BOOK-CHAPTER

Safe Policy Optimization for Reinforcement Learning in Robotics

Hao WangZhen Kan

Elsevier eBooks Year: 2024 Pages: 609-640
JOURNAL ARTICLE

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Qiyuan ZhangShu LengXiaoteng MaQihan LiuXueqian WangBin LiangYu LiuJun Yang

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2024 Vol: 36 (1)Pages: 830-841
JOURNAL ARTICLE

Game-Theoretic Constrained Policy Optimization for Safe Reinforcement Learning

Changxin ZhangXinglong ZhangYixing LanHao GaoXin Xu

Journal:   IEEE Transactions on Neural Networks and Learning Systems Year: 2025 Vol: 36 (10)Pages: 17990-18004
© 2026 ScienceGate Book Chapters — All rights reserved.