Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

Hao Wang; Jiahu Qin; Zhen Kan

doi:10.1109/tnnls.2024.3359031

ScienceGate Book Chapters

JOURNAL ARTICLE

Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

Hao Wang Jiahu Qin Zhen Kan

Year: 2024 Journal: IEEE Transactions on Neural Networks and Learning Systems Vol: 36 (2)Pages: 3808-3819 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tnnls.2024.3359031

Get Full-Text PDF Get Analytical Report

Abstract

Safe reinforcement learning (RL) has shown great potential for building safe general-purpose robotic systems. While many existing works have focused on post-training policy safety, it remains an open problem to ensure safety during training as well as to improve exploration efficiency. Motivated to address these challenges, this work develops shielded planning guided policy optimization (SPPO), a new model-based safe RL method that augments policy optimization algorithms with path planning and shielding mechanism. In particular, SPPO is equipped with shielded planning for guided exploration and efficient data collection via model predictive path integral (MPPI), along with an advantage-based shielding rule to keep the above processes safe. Based on the collected safe data, a task-oriented parameter optimization (TOPO) method is used for policy improvement, as well as the observation-independent latent dynamics enhancement. In addition, SPPO provides explicit theoretical guarantees, i.e., clear theoretical bounds for training safety, deployment safety, and the learned policy performance. Experiments demonstrate that SPPO outperforms baselines in terms of policy performance, learning efficiency, and safety performance during training.

Keywords:

Reinforcement learning Shielded cable Software deployment Computer science Electromagnetic shielding Task (project management) Mathematical optimization Path (computing) Operations research Artificial intelligence Simulation Industrial engineering Engineering Systems engineering Mathematics Telecommunications Software engineering Electrical engineering

Metrics

Cited By

3.19

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Software Reliability and Analysis Research

Physical Sciences → Computer Science → Software

Autonomous Vehicle Technology and Safety

Physical Sciences → Engineering → Automotive Engineering

Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Data Efficient Safe Reinforcement Learning

Planning for potential: efficient safe reinforcement learning

Safe Reinforcement Learning with Policy-Guided Planning for Autonomous Driving

Reinforcement Learning by Guided Safe Exploration

Shielded Reinforcement Learning for Safe and Optimal Cyber Physical Systems