Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

Robert Pinsler; Riad Akrour; Takayuki Osa; Jan Peters; Gerhard Neumann

doi:10.1109/icra.2018.8460907

ScienceGate Book Chapters

JOURNAL ARTICLE

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

Robert Pinsler Riad Akrour Takayuki Osa Jan Peters Gerhard Neumann

Year: 2018 Pages: 596-601

DOI: 10.1109/icra.2018.8460907

Get Full-Text PDF Get Analytical Report

Abstract

While reinforcement learning has led to promising results in robotics, defining an informative reward function can sometimes prove to be challenging. Prior work considered including the human in the loop to jointly learn the reward function and the optimal policy. Generating samples from a physical robot and requesting human feedback are both taxing efforts for which efficiency is critical. In contrast to prior work, in this paper we propose to learn reward functions from both the robot and the human perspectives in order to improve on both efficiency metrics. On one side, learning a reward function from the human perspective increases feedback efficiency by assuming that humans rank trajectories according to an outcome space of reduced dimensionaltiy. On the other side, learning a reward function from the robot perspective circumvents the need for learning a dynamics model while retaining the sample efficiency of model-based approaches. We provide an algorithm that incorporates bi-perspective reward learning into a general hierarchical reinforcement learning framework and demonstrate the merits of our approach on a toy task and a simulated robot grasping task.

Keywords:

Reinforcement learning Computer science Artificial intelligence Perspective (graphical) Sample (material) Task (project management) Robot Machine learning Function (biology) Rank (graph theory) Robot learning Robotics Outcome (game theory) Human-in-the-loop Space (punctuation) Mobile robot Mathematics Engineering

Metrics

Cited By

2.48

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Robot Manipulation and Learning

Physical Sciences → Engineering → Control and Systems Engineering

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Motor Control and Adaptation

Life Sciences → Neuroscience → Cognitive Neuroscience

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

Abstract

Metrics

Citation History

Topics

Related Documents

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling

Efficient and Robust Reinforcement Learning from Human Feedback

Sample Efficient Hierarchical Reinforcement Learning for the Game of Othello

Learning Hierarchical Policies from Human Feedback

VickreyFeedback: Cost-Efficient Data Construction for Reinforcement Learning from Human Feedback