JOURNAL ARTICLE

Dual Attention and Question Categorization-Based Visual Question Answering

Aakansha MishraAshish AnandPrithwijit Guha

Year: 2022 Journal:   IEEE Transactions on Artificial Intelligence Vol: 4 (1)Pages: 81-91   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Visual question answering (VQA) aims at predicting an answer to a natural language question associated with an image. This work focuses on two important issues pertaining to VQA, which is a complex multimodal AI task: First, the task of answer prediction in a large output answer space, and second, to obtain enriched representation through cross-modality interactions. This work aims to address these two issues by proposing a dual attention (DA) and question categorization (QC)-based visual question answering model (DAQC-VQA). DAQC-VQA has three main network modules: First, a novel dual attention mechanism that helps toward the objective of obtaining an enriched cross-domain representation of the two modalities; second, a question classifier subsystem for identifying input (natural language) question category. The second module of question categorizer helps in reducing the answer search space; and third, a subsystem for predicting answer depending on the question category. All component networks of DAQC-VQA are trained in an end-to-end manner with a joint loss function. The performance of DAQC-VQA is evaluated on two widely used VQA datasets, viz., TDIUC and VQA2.0. Experimental results demonstrate competitive performance of DAQC-VQA against the recent state-of-the-art VQA models. An ablation analysis indicates that the enriched representation obtained using the proposed dual-attention mechanism helps improve performance.

Keywords:
Question answering Computer science Categorization Artificial intelligence Dual (grammatical number) Classifier (UML) Natural language processing Machine learning Representation (politics) Natural language Modalities Function (biology) Task (project management) Linguistics

Metrics

20
Cited By
2.48
FWCI (Field Weighted Citation Impact)
39
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.