Dual Attention and Question Categorization-Based Visual Question Answering

Aakansha Mishra; Ashish Anand; Prithwijit Guha

doi:10.1109/tai.2022.3160418

ScienceGate Book Chapters

JOURNAL ARTICLE

Dual Attention and Question Categorization-Based Visual Question Answering

Aakansha Mishra Ashish Anand Prithwijit Guha

Year: 2022 Journal: IEEE Transactions on Artificial Intelligence Vol: 4 (1)Pages: 81-91 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tai.2022.3160418

Get Full-Text PDF Get Analytical Report

Abstract

Visual question answering (VQA) aims at predicting an answer to a natural language question associated with an image. This work focuses on two important issues pertaining to VQA, which is a complex multimodal AI task: First, the task of answer prediction in a large output answer space, and second, to obtain enriched representation through cross-modality interactions. This work aims to address these two issues by proposing a dual attention (DA) and question categorization (QC)-based visual question answering model (DAQC-VQA). DAQC-VQA has three main network modules: First, a novel dual attention mechanism that helps toward the objective of obtaining an enriched cross-domain representation of the two modalities; second, a question classifier subsystem for identifying input (natural language) question category. The second module of question categorizer helps in reducing the answer search space; and third, a subsystem for predicting answer depending on the question category. All component networks of DAQC-VQA are trained in an end-to-end manner with a joint loss function. The performance of DAQC-VQA is evaluated on two widely used VQA datasets, viz., TDIUC and VQA2.0. Experimental results demonstrate competitive performance of DAQC-VQA against the recent state-of-the-art VQA models. An ablation analysis indicates that the enriched representation obtained using the proposed dual-attention mechanism helps improve performance.

Keywords:

Question answering Computer science Categorization Artificial intelligence Dual (grammatical number) Classifier (UML) Natural language processing Machine learning Representation (politics) Natural language Modalities Function (biology) Task (project management) Linguistics

Metrics

Cited By

2.48

FWCI (Field Weighted Citation Impact)

Refs

0.88

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Dual Attention and Question Categorization-Based Visual Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Visual Question Answering Based on Question Attention Model

Co-attention Network for Visual Question Answering Based on Dual Attention

Question-Agnostic Attention for Visual Question Answering

Graph-enhanced visual representations and question-guided dual attention for visual question answering

Multi-stage Attention based Visual Question Answering