ScienceGate Book Chapters

JOURNAL ARTICLE

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Tianduo Wang Shichen Li Wei Lu

Year: 2024 Pages: 11917-11928

DOI: 10.18653/v1/2024.acl-long.643

Get Full-Text PDF Get Analytical Report

Keywords:

Preference Computer science Chain (unit) Training (meteorology) Artificial intelligence Cognitive psychology Psychology Mathematics Statistics

Metrics

2

Cited By

1.28

FWCI (Field Weighted Citation Impact)

0

Refs

0.78

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Cognitive Science and Mapping

Physical Sciences → Computer Science → Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

Chao Du Wei Gao Min Lin Qian Liu Tianyu Pang Xuan Zhang

Year: 2024 Pages: 333-356

JOURNAL ARTICLE

Enhancing Multimodal Chain-of-Thought Reasoning with Tree-Searched Self-Training

Yiwen Luo Wei Tao Yong Luo Zengmao Wang

Year: 2025 Pages: 1-6

JOURNAL ARTICLE

Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

Hanbing Liu Haoyang Li Xiaokang Zhang Ruotong Chen Haiyong Xu Tian Tian Qi Qi Jing Zhang

Year: 2025 Pages: 21223-21261

JOURNAL ARTICLE

Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Hongbo Zhang Han Cui Guangsheng Bao Linyi Yang Jun Wang Yue Zhang

Year: 2025 Pages: 13214-13227

JOURNAL ARTICLE

SILLM4Rec: Self-Improving with Chain of Thought Enhanced Preference Optimization for Multimodal Recommendation

Yuhao Wu Quan Fang M. X. Liu Xiaowen Huang Jitao Sang

Year: 2025 Pages: 1-8