Interpretable Visual Question Answering Referring to Outside Knowledge

He Zhu; Takahiro Ogawa; Ren Togo

doi:10.1109/icip49359.2023.10222423

ScienceGate Book Chapters

JOURNAL ARTICLE

Interpretable Visual Question Answering Referring to Outside Knowledge

He Zhu Ren Togo Takahiro Ogawa Ren Togo

Year: 2023 Pages: 2140-2144

DOI: 10.1109/icip49359.2023.10222423

Get Full-Text PDF Get Analytical Report

Abstract

We present a novel multimodal interpretable VQA model that can answer the question more accurately and generate diverse explanations. Although researchers have proposed several methods that can generate human-readable and fine-grained natural language sentences to explain a model's decision, these methods have focused solely on the information in the image. Ideally, the model should refer to various information inside and outside the image to correctly generate explanations, just as we use background knowledge daily. The proposed method incorporates information from outside knowledge and multiple image captions to increase the diversity of information available to the model. The contribution of this paper is to construct an interpretable visual question answering model using multimodal inputs to improve the rationality of generated results. Experimental results show that our model can outperform state-of-the-art methods regarding answer accuracy and explanation rationality.

Keywords:

Question answering Computer science Rationality Construct (python library) Artificial intelligence Image (mathematics) Natural (archaeology) Natural language Natural language processing Machine learning Information retrieval Epistemology

Metrics

Cited By

0.36

FWCI (Field Weighted Citation Impact)

Refs

0.54

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Interpretable Visual Question Answering Referring to Outside Knowledge

Abstract

Metrics

Citation History

Topics

Related Documents

Outside Knowledge Visual Question Answering Version 2.0

Outside-Knowledge Visual Question Answering for Visual Impaired People

Retrieval Augmented Visual Question Answering with Outside Knowledge

Increasing Interpretability in Outside Knowledge Visual Question Answering

Passage Retrieval for Outside-Knowledge Visual Question Answering