Diverse AI Feedback For Large Language Model Alignment

Tianshu Yu; Ting-En Lin; Yuchuan Wu; Min Yang; Fei Huang; Yongbin Li

doi:10.1162/tacl_a_00746

ScienceGate Book Chapters

JOURNAL ARTICLE

Diverse AI Feedback For Large Language Model Alignment

Tianshu Yu Ting-En Lin Yuchuan Wu Min Yang Fei Huang Yongbin Li

Year: 2025 Journal: Transactions of the Association for Computational Linguistics Vol: 13 Pages: 392-407 Publisher: Association for Computational Linguistics

DOI: 10.1162/tacl_a_00746

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Recent advances in large language models (LLMs) focus on aligning models with human values to minimize harmful content. However, existing methods often rely on a single type of feedback, such as preferences, annotated labels, or critiques, which can lead to overfitting and suboptimal performance. In this paper, we propose Diverse AIFeedback (DAIF), a novel approach that integrates three types of feedback—critique, refinement, and preference—tailored to tasks of varying uncertainty levels. Through an analysis of information gain, we show that critique feedback is most effective for low-uncertainty tasks, refinement feedback for medium-uncertainty tasks, and preference feedback for high-uncertainty tasks. Training with this diversified feedback reduces overfitting and improves alignment. Experimental results across three tasks—question answering, dialog generation, and text summarization–demonstrate that DAIF outperforms traditional methods relying on a single feedback type.1

Keywords:

Computer science Language model Natural language processing Artificial intelligence Speech recognition

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.05

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Machine Learning and Algorithms

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Diverse AI Feedback For Large Language Model Alignment

Abstract

Metrics

Topics

Related Documents

Unsupervised Large Language Model Alignment for Information Retrieval via Contrastive Feedback

Large Language Model Driven Scoring of Classroom Feedback with Interpretable Alignment Mechanisms

BP-LLM: Belief Propagation for Binary Feedback in Large Language Model Alignment

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment