CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Qiyuan Zhang; Shu Leng; Xiaoteng Ma; Qihan Liu; Xueqian Wang; Bin Liang; Yu Liu; Jun Yang

doi:10.1109/tnnls.2023.3331304

ScienceGate Book Chapters

JOURNAL ARTICLE

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Qiyuan Zhang Shu Leng Xiaoteng Ma Qihan Liu Xueqian Wang Bin Liang Yu Liu Jun Yang

Year: 2024 Journal: IEEE Transactions on Neural Networks and Learning Systems Vol: 36 (1)Pages: 830-841 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tnnls.2023.3331304

Get Full-Text PDF Get Analytical Report

Abstract

Current constrained reinforcement learning (RL) methods guarantee constraint satisfaction only in expectation, which is inadequate for safety-critical decision problems. Since a constraint satisfied in expectation remains a high probability of exceeding the cost threshold, solving constrained RL problems with high probabilities of satisfaction is critical for RL safety. In this work, we consider the safety criterion as a constraint on the conditional value-at-risk (CVaR) of cumulative costs, and propose the CVaR-constrained policy optimization algorithm (CVaR-CPO) to maximize the expected return while ensuring agents pay attention to the upper tail of constraint costs. According to the bound on the CVaR-related performance between two policies, we first reformulate the CVaR-constrained problem in augmented state space using the state extension procedure and the trust-region method. CVaR-CPO then derives the optimal update policy by applying the Lagrangian method to the constrained optimization problem. In addition, CVaR-CPO utilizes the distribution of constraint costs to provide an efficient quantile-based estimation of the CVaR-related value function. We conduct experiments on constrained control tasks to show that the proposed method can produce behaviors that satisfy safety constraints, and achieve comparable performance to most safe RL (SRL) methods.

Keywords:

CVAR Mathematical optimization Constraint (computer-aided design) Computer science Quantile Expected shortfall Reinforcement learning Mathematics Risk management Economics Econometrics Artificial intelligence

Metrics

Cited By

14.05

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Formal Methods in Verification

Physical Sciences → Computer Science → Computational Theory and Mathematics

CVaR-Constrained Policy Optimization for Safe Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Game-Theoretic Constrained Policy Optimization for Safe Reinforcement Learning

Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Convergent Policy Optimization for Safe Reinforcement Learning

Augmented Proximal Policy Optimization for Safe Reinforcement Learning