Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Joanne Yu; Brian Jalaian; Nathaniel D. Bastian

doi:10.1609/aaaiss.v4i1.31780

ScienceGate Book Chapters

JOURNAL ARTICLE

Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Joanne Yu Brian Jalaian Nathaniel D. Bastian

Year: 2024 Journal: Proceedings of the AAAI Symposium Series Vol: 4 (1)Pages: 110-113

DOI: 10.1609/aaaiss.v4i1.31780

Get Full-Text PDF Get Analytical Report

Abstract

This paper addresses the critical issue of hallucination in Large Vision-Language Models (LVLMs) by proposing a novel multi-agent framework. We integrate three post-hoc correction techniques: self-correction, external feedback, and agent debate, to enhance LVLM trustworthiness. Our approach tackles key challenges in LVLM hallucination, including weak visual encoders, parametric knowledge bias, and loss of visual attention during inference. The framework employs a Plug-in LVLM as the base model to reduce its hallucination, a Large Language Model (LLM) for guided refinement, external toolbox models for factual grounding, and an agent debate system for consensus-building. While promising, we also discuss potential limitations and technical challenges in implementing such a complex system. This work contributes to the ongoing effort to create more reliable and trustworthy multimodal multi-agent systems.

Keywords:

Computer science Artificial intelligence Computer vision Natural language processing

Metrics

Cited By

1.04

FWCI (Field Weighted Citation Impact)

Refs

0.70

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Brain Tumor Detection and Classification

Life Sciences → Neuroscience → Neurology

Mitigating Large Vision-Language Model Hallucination at Post-hoc via Multi-agent System

Abstract

Metrics

Citation History

Topics

Related Documents

Mitigating Hallucination in Multimodal Large Language Model via Hallucination-targeted Direct Preference Optimization

Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization

Mitigating reasoning hallucination through Multi-agent Collaborative Filtering

Mitigating Hallucination in Large Language Model by Leveraging Decoder Layer Contrasting