JOURNAL ARTICLE

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji ParkDeokyeong LeeJunsuk ChoeBuru Chang

Year: 2025 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (6)Pages: 6434-6442   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By comparing the contrasting probability distributions produced by the original and reconstructed images, ConVis enables MLLMs to capture visual contrastive signals that penalize hallucination generation. Notably, this method operates purely within the decoding process, eliminating the need for additional data or model updates. Our extensive experiments on five popular benchmarks demonstrate that ConVis effectively reduces hallucinations across various MLLMs, highlighting its potential to enhance model reliability.

Keywords:
Decoding methods Visual Hallucination Computer science Psychology Visualization Linguistics Cognitive psychology Artificial intelligence Philosophy Algorithm

Metrics

1
Cited By
3.22
FWCI (Field Weighted Citation Impact)
0
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Machine Learning in Healthcare
Physical Sciences →  Computer Science →  Artificial Intelligence
Mental Health Research Topics
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Mental Health via Writing
Social Sciences →  Psychology →  Social Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.