Multi-Module Co-Attention Model for Visual Question Answering

ZOU Pinrong, XIAO Feng, ZHANG Wenjuan, ZHANG Wanyu, WANG Chenyang

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Module Co-Attention Model for Visual Question Answering

ZOU Pinrong, XIAO Feng, ZHANG Wenjuan, ZHANG Wanyu, WANG Chenyang

Year: 2022 Journal: DOAJ (DOAJ: Directory of Open Access Journals)

Get Full-Text PDF Get Analytical Report

Abstract

Visual Question Answering(VQA) is a typical multi-modal problem in computer vision and natural language processing.Most of the existing VQA models ignore the dynamic relationships of semantic information between two modes and the rich spatial structure of an image.For this reason, the paper proposes a novel multi-module Co-Attention Network named MMCAN, which can fully understand the dynamic interactions between objects and contextual text representation in a visual scenario.Based on the graph attention mechanism, relations between different types of objects are modeled.The adaptive relation representation of the problem is learnt, and the visual object relations as well as the problem features are encoded through co-attention to strengthen the dependence between the words and corresponding image areas.Finally, the enhancement module is used to improve the fitting ability of the model. Experimental results on the open data set VQA 2.0 and VQA-CP V2 show that the accuracy of the proposed model is significantly better than that of DA-NTN, ReGAT and ODA-GCN for "total", "yes/no", "count" and "other" categories of questions.It can effectively improve the accuracy of visual question answering.

Keywords:

Question answering Representation (politics) Set (abstract data type) Relation (database) Object (grammar) Spatial relation Natural language Visualization

Metrics

Cited By

0.12

FWCI (Field Weighted Citation Impact)

Refs

0.44

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Technologies in Various Fields

Physical Sciences → Computer Science → Artificial Intelligence

Multi-Module Co-Attention Model for Visual Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Visual Question Answering System Using Co-attention Model

Multi-Channel Co-Attention Network for Visual Question Answering

Multi-modal co-attention relation networks for visual question answering

Aggregated Co-attention based Visual Question Answering

Multi-stage Attention based Visual Question Answering