Fine-grained Adaptive Visual Prompt for Generative Medical Visual Question Answering

Ting Yu; Z. Tong; Jun Yu; Ke Zhang

doi:10.1609/aaai.v39i9.33047

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-grained Adaptive Visual Prompt for Generative Medical Visual Question Answering

Ting Yu Z. Tong Jun Yu Ke Zhang

Year: 2025 Journal: Proceedings of the AAAI Conference on Artificial Intelligence Vol: 39 (9)Pages: 9662-9670 Publisher: Association for the Advancement of Artificial Intelligence

DOI: 10.1609/aaai.v39i9.33047

Get Full-Text PDF Get Analytical Report

Abstract

Medical Visual Question Answering (MedVQA) serves as an automated medical assistant, capable of answering patient queries and aiding physician diagnoses based on medical images and questions. Recent advancements have shown that incorporating Large Language Models (LLMs) into MedVQA tasks significantly enhances the capability for answer generation. However, for tasks requiring fine-grained organ-level precise localization, relying solely on language prompts struggles to accurately locate relevant regions within medical images due to substantial background noise. To address this challenge, we explore the use of visual prompts in MedVQA tasks for the first time and propose fine-grained adaptive visual prompts to enhance generative MedVQA. Specifically, we introduce an Adaptive Visual Prompt Creator that adaptively generates region-level visual prompts based on image characteristics of various organs, providing fine-grained references for LLMs during answer retrieval and generation from the medical domain, thereby improving the model's precise cross-modal localization capabilities on original images. Furthermore, we incorporate a Hierarchical Answer Generator with Parameter-Efficient Fine-Tuning (PEFT) techniques, significantly enhancing the model's understanding of spatial and contextual information with minimal parameter increase, promoting the alignment of representation learning with the medical space. Extensive experiments on VQA-RAD, SLAKE, and DME datasets validate the effectiveness of our proposed method, demonstrating its potential in generative MedVQA.

Keywords:

Question answering Generative grammar Computer science Artificial intelligence

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.24

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fine-grained Adaptive Visual Prompt for Generative Medical Visual Question Answering

Abstract

Metrics

Topics

Related Documents

Fine-grained knowledge fusion for retrieval-augmented medical visual question answering

Fine-Grained Unbalanced Interaction Network for Visual Question Answering

AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering

GViG: Generative Visual Grounding Using Prompt-Based Language Modeling for Visual Question Answering

Medical visual question answering