JOURNAL ARTICLE

Mitigating Political Bias in Language Models through Reinforced Calibration

Ruibo LiuChenyan JiaJason WeiGuangxuan XuLili WangSoroush Vosoughi

Year: 2021 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 35 (17)Pages: 14857-14866   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings. In this paper, we describe metrics for measuring political bias in GPT-2 generation and propose a reinforcement learning (RL) framework for mitigating political biases in generated text. By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained. In empirical experiments on three attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.

Keywords:
Readability Computer science Classifier (UML) Language model Artificial intelligence Natural language processing Reinforcement learning Coherence (philosophical gambling strategy) Machine learning Politics Statistics Political science Law

Metrics

60
Cited By
6.13
FWCI (Field Weighted Citation Impact)
80
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.