Learning Representations from Explainable and Connectionist Approaches for Visual Question Answering

Aakansha Mishra; Miriyala Srinivas Soumitri; Vikram N Rajendiran

doi:10.1109/icassp48485.2024.10447493

ScienceGate Book Chapters

JOURNAL ARTICLE

Learning Representations from Explainable and Connectionist Approaches for Visual Question Answering

Aakansha Mishra Miriyala Srinivas Soumitri Vikram N Rajendiran

Year: 2024 Vol: 32 Pages: 6420-6424

DOI: 10.1109/icassp48485.2024.10447493

Get Full-Text PDF Get Analytical Report

Abstract

Reasoning conditioned on visual and linguistic information has gained immense importance in recent times. The prior art in Visual Question Answering (VQA) has been predominantly connectionist in nature. To resolve the issues of connectionist AI models, Symbolic models were proposed that allowed for explainable visual reasoning. In addition to semantic parsing, such models worked towards visual parsing resulting in scene graphs that provided scope for accurate reasoning conditioned on the explainable scene graphs. However, the real scenarios of VQA cannot always be segregated exclusively into connectionist (neural networks) and conceptual modalities. Rather, they are always dependent on the relationships and interactions between the two modalities. In this work, the authors proposed a question-guided attention mechanism that combines the approach of explainable visual reasoning through scene graphs with a cross-modality-based multi-head attention mechanism. The contributions of con-nectionist and conceptual modalities are learned through the semantic parsing of questions in each VQA task. The novel method is tested with the VQA2.0 and GQA and it resulted in 65.31% and 63.06% accuracy, respectively, which is better than the state-of-the-art in explainable AI.

Keywords:

Connectionism Computer science Parsing Artificial intelligence Visual reasoning Modalities Question answering Natural language processing Modality (human–computer interaction) Cognitive science Artificial neural network Machine learning Psychology

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.03

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Learning Representations from Explainable and Connectionist Approaches for Visual Question Answering

Abstract

Metrics

Topics

Related Documents

Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering

Learning Convolutional Text Representations for Visual Question Answering

Learning Convolutional Text Representations for Visual Question Answering

Multimodal Rationales for Explainable Visual Question Answering

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering