Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback

Lindner, David

doi:10.3929/ethz-b-000635156

ScienceGate Book Chapters

JOURNAL ARTICLE

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback

Lindner, David

Year: 2023 Journal: Repository for Publications and Research Data (ETH Zurich) Publisher: ETH Zurich

DOI: 10.3929/ethz-b-000635156

Get Full-Text PDF Get Analytical Report

Abstract

Reinforcement learning (RL) has shown remarkable success in applications with well-defined reward functions, such as maximizing the score in a video game or optimizing an algorithm’s run-time. However, in many real-world applications, there is no well-defined reward function. Instead, Reinforcement Learning from Human Feedback (RLHF) allows RL agents to learn from human-provided data, such as evaluations or rankings of trajectories. In many applications, human feedback is expensive to collect; therefore, learning robust policies from limited data is crucial. In this dissertation, we propose novel algorithms to enhance the sample efficiency and robustness of RLHF. \n \nFirst, we propose active learning algorithms to improve the sample efficiency of RLHF by selecting the most informative data points for the user to label and by exploring the environment guided by uncertainty about the user’s preferences. Our approach provides conceptual clarity about active learning for RLHF and theoretical sample complexity results, drawing inspiration from multi-armed bandits and Bayesian optimization. Moreover, we provide extensive empirical evaluations in simulations that demonstrate the benefit of active learning for RLHF. \n \nSecond, we extend RLHF to learning constraints from human preferences instead of or in addition to rewards. We argue that constraints are a particularly natural representation of human preferences, particularly in safety-critical applications. We develop algorithms to learn constraints effectively from demonstrations with unknown rewards and actively learn constraints from human feedback. Our results suggest that representing human preferences as constraints can lead to safer policies and extend the potential applications for RLHF. \n \nThe proposed algorithms for reward and constraint learning serve as a foundation for future research to enhance the efficiency, safety, and applicability of RLHF.

Keywords:

Matching (statistics) Reinforcement learning Context (archaeology) Probabilistic logic Bayesian probability

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.40

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Bandit Algorithms Research

Social Sciences → Decision Sciences → Management Science and Operations Research

Reinforcement Learning in Robotics

Physical Sciences → Computer Science → Artificial Intelligence

Artificial Intelligence in Games

Physical Sciences → Computer Science → Artificial Intelligence

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback

Abstract

Metrics

Topics

Related Documents

Algorithmic Foundations of Reinforcement Learning

Efficient and Robust Reinforcement Learning from Human Feedback

Statistical and Algorithmic Foundations of Reinforcement Learning

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

VickreyFeedback: Cost-Efficient Data Construction for Reinforcement Learning from Human Feedback