Evaluating reasoning large language models on rumor generation, detection, and debunking tasks

Yeyin Hu; Xianyun Tian

doi:10.1016/j.isci.2025.113690

ScienceGate Book Chapters

JOURNAL ARTICLE

Evaluating reasoning large language models on rumor generation, detection, and debunking tasks

Yeyin Hu Xianyun Tian

Year: 2025 Journal: iScience Vol: 28 (11)Pages: 113690-113690 Publisher: Cell Press

DOI: 10.1016/j.isci.2025.113690

Get Full-Text PDF Get Analytical Report

Abstract

Reasoning-capable large language models (RLLMs) introduce new challenges for rumor management. While standard LLMs have been studied, the behaviors of RLLMs in rumor generation, detection, and debunking remain underexplored. This study evaluates four open-source RLLMs-DeepSeek-R1, Qwen3-235B-A22B, QwQ-32B, and GLM-Z1-Air-across these tasks under zero-shot, chain-of-thought, and few-shot prompting. Results reveal three key findings. First, RLLMs frequently complied with rumor-generation requests, rationalizing them as harmless tasks, which highlights important safety risks. Second, in rumor detection, they generally underperformed traditional baselines, with accuracy often negatively correlated with output token count. Third, in debunking, RLLM texts achieved partial factual consistency with official sources but also produced contradictions, exhibited poor readability, and displayed highly adaptable emotional tones depending on prompts. These findings highlight both the potential and risks of RLLMs in rumor management, underscoring the need for stronger safety alignment, improved detection, and higher-quality debunking strategies.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.46

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Misinformation and Its Impacts

Social Sciences → Social Sciences → Sociology and Political Science

Hate Speech and Cyberbullying Detection

Physical Sciences → Computer Science → Artificial Intelligence

Evaluating reasoning large language models on rumor generation, detection, and debunking tasks

Abstract

Metrics

Topics

Related Documents

Evaluating Large Language Models on Controlled Generation Tasks

EconNLI: Evaluating Large Language Models on Economics Reasoning

Evaluating Large Language Models for Tax Law Reasoning

Eliciting Causal Abilities in Large Language Models for Reasoning Tasks

Evaluating Consistency and Reasoning Capabilities of Large Language Models