Safe Reinforcement Learning for Robotics: From Exploration to Policy Learning

Liu, Puze

doi:10.26083/tuprints-00029666

ScienceGate Book Chapters

JOURNAL ARTICLE

Safe Reinforcement Learning for Robotics: From Exploration to Policy Learning

Liu, Puze

Year: 2025 Journal: TUbilio (Technical University of Darmstadt) Publisher: Technical University of Darmstadt

DOI: 10.26083/tuprints-00029666

Get Full-Text PDF Get Analytical Report

Abstract

The development of technology and the accompanying concerns about safety always go hand in hand. With the advancement of robotics and reinforcement learning technologies, recent research has demonstrated more and more successful cases. We have witnessed robots accomplishing tasks once thought impossible. Meanwhile, worries about the safety of such technologies are growing daily, particularly for physical robots, which are able to interact with the real world. Therefore, how to build learning robots without posing risks to humans, the environment, and themselves is a critical question. This thesis focuses on the safety problem in reinforcement learning and robotics. Safety concerns have been extensively studied across various fields, including control theory, machine learning, and robotics. Traditional control-based approaches leverage substantial domain knowledge to ensure system safety. While these methods offer strong safety guarantees, they are often limited to specific tasks and platforms. On the other hand, reinforcement learning approaches are more general and make fewer assumptions about the environment, but they are data intensive and typically lack a safety guarantee. This leads us to the central question of this thesis: What is the relationship between the amount of domain knowledge required and the level of safety it provides in the context of Safe Reinforcement Learning in robotics? To explore this question, we focus primarily on two types of safety problems in reinforcement learning: Safe Exploration, which ensures that the agent avoids risky actions during the learning process, and Safe Policy Learning, which ensures that the final trained agent operates safely. We begin by investigating the Safe Exploration problem, examining which types of domain knowledge are necessary for an agent to learn safely. Gradually, we reduce reliance on certain components of this domain knowledge, substituting them with data-driven methods to understand the impact on safety levels. Ultimately, we delve into the model-free Safe Policy Learning problem and propose a novel method that enables the agent to learn a safe policy at the end of training. With the central problem in mind, we propose a series of methods leveraging different levels of domain knowledge to build a safe learning robot. (1) We build a model-based safe exploration method, Acting on the TAngent Space of the COnstraint Manifold (ATACOM), which exploits the knowledge of the robot dynamics and constraints to construct a constraint manifold. By building the tangent space of the constraint manifold, we can construct a safe action space allowing the agent to explore safely. We show theoretically that ATACOM builds a safe controller and demonstrate practically that our method ensures Safe Exploration that can be deployed for the training on the real robot. (2) We study the impact of replacing the hand-crafted constraints with a learned or learnable safety function. We propose a novel method, Regularized Deep Signed Distance Fields (ReDSDF), which learns a distance function to objects with complex shapes or articulations particularly useful for collision avoidance in robotics. We demonstrate that ReDSDF, in combination with ATACOM, enables safe manipulation tasks in dynamic Human-Robot Interaction (HRI) scenarios. Moving further, we eliminate the need for pretraining by learning the constraint function during the Reinforcement Learning (RL) process. We use a distributional safety-critic to account for Long-Term Safety and Uncertainty. Combined with ATACOM, we propose a novel method, Distributional ATACOM (DATACOM), which integrates the ATACOM with a learnable safety constraint. We demonstrate that the agent learns a safer policy with fewer violations. (3) We then remove the assumption that the robot dynamics is known a prior and develop a model-free approach based on the Distributional Reinforcement Learning (RL). We propose a novel SafeRL safety critic, Safe Probability Function (SPF), which estimates the probability of the agent remaining safe in the future. To incorporate the uncertainty of the safety estimation into the exploration and policy learning process, we introduce the Distributional Safe Probability Function (DSPF), which treats the safe probability as a random variable. We show that DSPF is an effective method for Safe Policy Learning in a model-free setting. Overall, this thesis provides a comprehensive study of the trade-off between domain knowledge and safety in the context of SafeRL in robotics. We demonstrate that based on different safety tolerance levels, we can leverage different levels of domain knowledge to build a safe learning robot.

Keywords:

Reinforcement learning Leverage (statistics) Context (archaeology) Domain (mathematical analysis) Control (management) Robotics Reinforcement Robot Domain knowledge

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.22

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Adversarial Robustness in Machine Learning

Physical Sciences → Computer Science → Artificial Intelligence

Robot Manipulation and Learning

Physical Sciences → Engineering → Control and Systems Engineering

Safe Reinforcement Learning for Robotics: From Exploration to Policy Learning

Abstract

Metrics

Topics

Related Documents

Safe Policy Optimization for Reinforcement Learning in Robotics

Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning

Safe Exploration in Reinforcement Learning: Theory and Applications in Robotics

Safe Exploration in Reinforcement Learning for Learning from Human Experts

Reinforcement Learning by Guided Safe Exploration