Efficient and Robust Reinforcement Learning from Human Feedback

Huazheng Wang

doi:10.1609/aaai.v39i27.35123

ScienceGate Book Chapters

JOURNAL ARTICLE

Efficient and Robust Reinforcement Learning from Human Feedback

Huazheng Wang

Year: 2025 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (27)Pages: 28730-28730 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v39i27.35123

Get Full-Text PDF Get Analytical Report

Abstract

Reinforcement Learning (RL) has emerged as a powerful paradigm for sequential decision-making with numerous real-world applications. However, in practical environments such as recommender systems, search engines, and LLMs, RL algorithms must efficiently learn from biased human feedback that may be subject to corruption. In this talk, I will present our recent efforts in developing robust RL algorithms that can provably effectively handle such challenging scenarios. First, I will introduce our works on reinforcement learning from biased click feedback in ranking. While previous approaches typically relied on strong assumptions about human click behavior (formalized as click models) and required specialized debiasing methods for different models, we propose a novel unified framework that formulates the ranking process under general click models as a Markov Decision Process, enabling the development of a click model-agnostic RL algorithm. Second, I will introduce the fundamental vulnerability of bandits and reinforcement learning under corrupted feedback. Our theoretical analysis provides complete necessity and sufficiency characterizations of the attackability of linear bandits and linear RL, revealing their intrinsic robustness and limitations. Lastly, I will discuss our recent works on improving RL finetuning for LLMs, including sample efficient off-policy RLHF and solving the gradient entanglement issue in margin-based alignment methods.

Keywords:

Reinforcement learning Reinforcement Computer science Psychology Artificial intelligence Social psychology

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.12

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Neural Networks and Applications

Physical Sciences → Computer Science → Artificial Intelligence

Efficient and Robust Reinforcement Learning from Human Feedback

Abstract

Metrics

Topics

Related Documents

Robust Reinforcement Learning from Corrupted Human Feedback

Sample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences

VickreyFeedback: Cost-Efficient Data Construction for Reinforcement Learning from Human Feedback

Sample-Efficient Reinforcement Learning From Human Feedback via Information-Directed Sampling

Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback