Sindhu PadakandlaK. J. PrabuchandranS. GangulyShalabh Bhatnagar
Applying reinforcement learning (RL) methods for real world applications pose multiple challenges - the foremost being safety of the system controlled by the learning agent and the learning efficiency. An RL agent learns to control a system by exploring the available actions in various operating states. In some states, when the RL agent exercises an exploratory action, the system may enter unsafe operation, which can lead to safety hazards both for the system as well as for humans supervising the system. RL algorithms thus must learn to control the system respecting safety. In this work, we formulate the safe RL problem in the constrained off-policy setting that facilitates safe exploration by the RL agent. We then develop a sample efficient algorithm utilizing the cross-entropy method. The proposed algorithm's safety performance is evaluated numerically on benchmark RL problems.
Daniel Sundquist Brown0000-0002-9570-1832
Floris den HengstVincent François-LavetMark HoogendoornFrank van Harmelen
Ofir NachumShixiang GuHonglak LeeSergey Levine