JOURNAL ARTICLE

Dual Variable Actor-Critic for Adaptive Safe Reinforcement Learning

Abstract

Satisfying safety constraints in reinforcement learning (RL) is an important issue, especially in real-world applications. Many studies have approached safe RL with the Lagrangian method, which introduces dual variables. However, applying a trained policy with the optimal dual variable to a new environment can be hazardous since the optimal value of the dual variable, which represents a level of safety, depends on the environmental setting. To this end, we propose a new framework, dual variable actor-critic (DVAC), that solves the safe RL problem by simultaneously training a single policy over different safety levels. We introduce a universal policy and universal Q-function, which have a dual variable as an argument. Then, we extend the soft actor-critic so that the universal policy is guaranteed to converge to the Pareto optimal policy sets. We evaluate the proposed method in simulation and real-world environments. The universal policy learned with the proposed method ranges from extremely safe to high performance according to the dual variables, and is nearly Pareto optimal compared to policies learned with the baseline methods. In addition, the agent is able to adapt to environments with unseen state distributions without additional training by identifying a suitable dual variable using the proposed method.

Keywords:
Reinforcement learning Dual (grammatical number) Variable (mathematics) Computer science Pareto principle Mathematical optimization State variable Function (biology) Argument (complex analysis) Augmented Lagrangian method Bellman equation Artificial intelligence Mathematics Algorithm

Metrics

1
Cited By
0.26
FWCI (Field Weighted Citation Impact)
20
Refs
0.59
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.