JOURNAL ARTICLE

Learning Representations from Explainable and Connectionist Approaches for Visual Question Answering

Abstract

Reasoning conditioned on visual and linguistic information has gained immense importance in recent times. The prior art in Visual Question Answering (VQA) has been predominantly connectionist in nature. To resolve the issues of connectionist AI models, Symbolic models were proposed that allowed for explainable visual reasoning. In addition to semantic parsing, such models worked towards visual parsing resulting in scene graphs that provided scope for accurate reasoning conditioned on the explainable scene graphs. However, the real scenarios of VQA cannot always be segregated exclusively into connectionist (neural networks) and conceptual modalities. Rather, they are always dependent on the relationships and interactions between the two modalities. In this work, the authors proposed a question-guided attention mechanism that combines the approach of explainable visual reasoning through scene graphs with a cross-modality-based multi-head attention mechanism. The contributions of con-nectionist and conceptual modalities are learned through the semantic parsing of questions in each VQA task. The novel method is tested with the VQA2.0 and GQA and it resulted in 65.31% and 63.06% accuracy, respectively, which is better than the state-of-the-art in explainable AI.

Keywords:
Connectionism Computer science Parsing Artificial intelligence Visual reasoning Modalities Question answering Natural language processing Modality (human–computer interaction) Cognitive science Artificial neural network Machine learning Psychology

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
25
Refs
0.03
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.