JOURNAL ARTICLE

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback

Lindner, David

Year: 2023 Journal:   Repository for Publications and Research Data (ETH Zurich)   Publisher: ETH Zurich

Abstract

Reinforcement learning (RL) has shown remarkable success in applications with well-defined reward functions, such as maximizing the score in a video game or optimizing an algorithm’s run-time. However, in many real-world applications, there is no well-defined reward function. Instead, Reinforcement Learning from Human Feedback (RLHF) allows RL agents to learn from human-provided data, such as evaluations or rankings of trajectories. In many applications, human feedback is expensive to collect; therefore, learning robust policies from limited data is crucial. In this dissertation, we propose novel algorithms to enhance the sample efficiency and robustness of RLHF. \n \nFirst, we propose active learning algorithms to improve the sample efficiency of RLHF by selecting the most informative data points for the user to label and by exploring the environment guided by uncertainty about the user’s preferences. Our approach provides conceptual clarity about active learning for RLHF and theoretical sample complexity results, drawing inspiration from multi-armed bandits and Bayesian optimization. Moreover, we provide extensive empirical evaluations in simulations that demonstrate the benefit of active learning for RLHF. \n \nSecond, we extend RLHF to learning constraints from human preferences instead of or in addition to rewards. We argue that constraints are a particularly natural representation of human preferences, particularly in safety-critical applications. We develop algorithms to learn constraints effectively from demonstrations with unknown rewards and actively learn constraints from human feedback. Our results suggest that representing human preferences as constraints can lead to safer policies and extend the potential applications for RLHF. \n \nThe proposed algorithms for reward and constraint learning serve as a foundation for future research to enhance the efficiency, safety, and applicability of RLHF.

Keywords:
Matching (statistics) Reinforcement learning Context (archaeology) Probabilistic logic Bayesian probability

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Bandit Algorithms Research
Social Sciences →  Decision Sciences →  Management Science and Operations Research
Reinforcement Learning in Robotics
Physical Sciences →  Computer Science →  Artificial Intelligence
Artificial Intelligence in Games
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Algorithmic Foundations of Reinforcement Learning

Stephan Pareigis

Lecture notes in networks and systems Year: 2024 Pages: 1-27
JOURNAL ARTICLE

Efficient and Robust Reinforcement Learning from Human Feedback

Huazheng Wang

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2025 Vol: 39 (27)Pages: 28730-28730
© 2026 ScienceGate Book Chapters — All rights reserved.