Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning

Junseo Lee; Jaeseok Heo; Dohyeong Kim; Gunmin Lee; Songhwai Oh

doi:10.1109/iros55552.2023.10341973

ScienceGate Book Chapters

JOURNAL ARTICLE

Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning

Junseo Lee Jaeseok Heo Dohyeong Kim Gunmin Lee Songhwai Oh

Year: 2023 Pages: 7568-7573

DOI: 10.1109/iros55552.2023.10341973

Get Full-Text PDF Get Analytical Report

Abstract

Satisfying safety constraints in reinforcement learning (RL) is an important issue, especially in real-world applications. Many studies have approached safe RL with the Lagrangian method, which introduces dual variables. However, applying a trained policy with the optimal dual variable to a new environment can be hazardous since the optimal value of the dual variable, which represents a level of safety, depends on the environmental setting. To this end, we propose a new framework, dual variable actor-critic (DVAC), that solves the safe RL problem by simultaneously training a single policy over different safety levels. We introduce a universal policy and universal Q-function, which have a dual variable as an argument. Then, we extend the soft actor-critic so that the universal policy is guaranteed to converge to the Pareto optimal policy sets. We evaluate the proposed method in simulation and real-world environments. The universal policy learned with the proposed method ranges from extremely safe to high performance according to the dual variables, and is nearly Pareto optimal compared to policies learned with the baseline methods. In addition, the agent is able to adapt to environments with unseen state distributions without additional training by identifying a suitable dual variable using the proposed method.

Keywords:

Reinforcement learning Dual (grammatical number) Variable (mathematics) Computer science Pareto principle Mathematical optimization State variable Function (biology) Argument (complex analysis) Augmented Lagrangian method Bellman equation Artificial intelligence Mathematics Algorithm

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.59

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration

Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning

Supervised Actor-Critic Reinforcement Learning

Multi-agent dual actor-critic framework for reinforcement learning navigation

Self-Guided Actor-Critic: Reinforcement Learning from Adaptive Expert Demonstrations