JOURNAL ARTICLE

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Keywords:
Preference Computer science Chain (unit) Training (meteorology) Artificial intelligence Cognitive psychology Psychology Mathematics Statistics
© 2026 ScienceGate Book Chapters — All rights reserved.