JOURNAL ARTICLE

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

Abstract

While reinforcement learning has led to promising results in robotics, defining an informative reward function can sometimes prove to be challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. In contrast to prior work, in this paper we propose to learn reward functions from both the robot and the human perspectives in order to improve on both efficiency metrics. On one side, learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to an outcome space of reduced dimensionaltiy. On the other side, learning a reward function from the robot perspective circumvents the need for learning a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

Keywords:
Reinforcement learning Computer science Artificial intelligence Perspective (graphical) Sample (material) Task (project management) Robot Machine learning Function (biology) Rank (graph theory) Robot learning Robotics Outcome (game theory) Human-in-the-loop Space (punctuation) Mobile robot Mathematics Engineering

Metrics

19
Cited By
2.48
FWCI (Field Weighted Citation Impact)
42
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Robot Manipulation and Learning
Physical Sciences →  Engineering →  Control and Systems Engineering
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Motor Control and Adaptation
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

JOURNAL ARTICLE

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling

Qi HanHaochen YangQiaosheng ZhangZhuoran Yang

Journal:   IEEE Transactions on Information Theory Year: 2025 Vol: 71 (10)Pages: 7942-7958
JOURNAL ARTICLE

Efficient and Robust Reinforcement Learning from Human Feedback

Huazheng Wang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2025 Vol: 39 (27)Pages: 28730-28730
DISSERTATION

Learning Hierarchical Policies from Human Feedback

Christian Daniel

University:   Technischen Universität Darmstadt Year: 2016
© 2026 ScienceGate Book Chapters — All rights reserved.