JOURNAL ARTICLE

Safe Exploration in Reinforcement Learning for Learning from Human Experts

Abstract

This work introduces a new reinforcement learning (RL) method, which has safe exploration and estimates uncertainty in state-action pairs using Monte Carlo (MC) dropout. The proposed method outperforms biased exploration in terms of reward obtained during training. The study also investigates the sensitivity of the algorithm to the uncertainty threshold hyperparameter, suggesting that a lower value leads to a safer policy, but a higher value can result in faster convergence. The proposed algorithm is evaluated in guiding a 2 degree-of-freedom planar robot in its task-space, showing that it can converge to an optimal policy while ensuring safety constraints are met.

Keywords:
Reinforcement learning Hyperparameter SAFER Computer science Convergence (economics) Task (project management) Sensitivity (control systems) Artificial intelligence Machine learning Robot Dropout (neural networks) Monte Carlo method Space (punctuation) Mathematical optimization Mathematics Engineering Statistics

Metrics

3
Cited By
0.77
FWCI (Field Weighted Citation Impact)
24
Refs
0.73
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Neural dynamics and brain function
Life Sciences →  Neuroscience →  Cognitive Neuroscience
Simulation Techniques and Applications
Social Sciences →  Decision Sciences →  Management Science and Operations Research

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.