JOURNAL ARTICLE

Evaluating reasoning large language models on rumor generation, detection, and debunking tasks

Yeyin HuXianyun Tian

Year: 2025 Journal:   iScience Vol: 28 (11)Pages: 113690-113690   Publisher: Cell Press

Abstract

Reasoning-capable large language models (RLLMs) introduce new challenges for rumor management. While standard LLMs have been studied, the behaviors of RLLMs in rumor generation, detection, and debunking remain underexplored. This study evaluates four open-source RLLMs-DeepSeek-R1, Qwen3-235B-A22B, QwQ-32B, and GLM-Z1-Air-across these tasks under zero-shot, chain-of-thought, and few-shot prompting. Results reveal three key findings. First, RLLMs frequently complied with rumor-generation requests, rationalizing them as harmless tasks, which highlights important safety risks. Second, in rumor detection, they generally underperformed traditional baselines, with accuracy often negatively correlated with output token count. Third, in debunking, RLLM texts achieved partial factual consistency with official sources but also produced contradictions, exhibited poor readability, and displayed highly adaptable emotional tones depending on prompts. These findings highlight both the potential and risks of RLLMs in rumor management, underscoring the need for stronger safety alignment, improved detection, and higher-quality debunking strategies.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
32
Refs
0.46
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Misinformation and Its Impacts
Social Sciences →  Social Sciences →  Sociology and Political Science
Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.