ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park; Deokyeong Lee; Junsuk Choe; Buru Chang

doi:10.1609/aaai.v39i6.32689

ScienceGate Book Chapters

JOURNAL ARTICLE

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park Deokyeong Lee Junsuk Choe Buru Chang

Year: 2025 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (6)Pages: 6434-6442 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v39i6.32689

Get Full-Text PDF Get Analytical Report

Abstract

Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By comparing the contrasting probability distributions produced by the original and reconstructed images, ConVis enables MLLMs to capture visual contrastive signals that penalize hallucination generation. Notably, this method operates purely within the decoding process, eliminating the need for additional data or model updates. Our extensive experiments on five popular benchmarks demonstrate that ConVis effectively reduces hallucinations across various MLLMs, highlighting its potential to enhance model reliability.

Keywords:

Decoding methods Visual Hallucination Computer science Psychology Visualization Linguistics Cognitive psychology Artificial intelligence Philosophy Algorithm

Metrics

Cited By

3.22

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Machine Learning in Healthcare

Physical Sciences → Computer Science → Artificial Intelligence

Mental Health Research Topics

Social Sciences → Psychology → Experimental and Cognitive Psychology

Mental Health via Writing

Social Sciences → Psychology → Social Psychology

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding

Mitigating Hallucinations in Large Vision-Language Models via Dual Contrastive Decoding

Mitigating Hallucinations in Large Vision-Language Models via Grounded-Contrastive Decoding

Mitigating Hallucinations in Large Vision-Language Models (LVLMs) via Language-Contrastive Decoding (LCD)

Mitigating Hallucinations in Large Vision-Language Models via Visual-Enhanced Contrastive Decoding