Multi-Channel Co-Attention Network for Visual Question Answering

Weidong Tian; Bin He; Nan-Xun Wang; Zhong‐Qiu Zhao

doi:10.1109/ijcnn48605.2020.9207058

ScienceGate Book Chapters

JOURNAL ARTICLE

Multi-Channel Co-Attention Network for Visual Question Answering

Weidong Tian Bin He Nan-Xun Wang Zhong‐Qiu Zhao

Year: 2020 Pages: 1-8

DOI: 10.1109/ijcnn48605.2020.9207058

Get Full-Text PDF Get Analytical Report

Abstract

Visual Question Answering (VQA) is to reason out correct answers based on input questions and images. Significant progresses have been made by learning rich embedding features from images and questions by bilinear models. Attention mechanisms are widely used to focus on specific visual and textual information in VQA reasoning process. However, most state-of-the-art methods concentrate on fusing the global multi-modal features, while neglect local features. Besides, the dimension is reduced excessively (from K×2048 to 2048) in general visual attention, which causes a mass of visual information loss. In this paper, we propose a novel multi-channel co-attention network (MC-CAN), which integrates multi-modal features from global level to local level. We design different multi-channel attention mechanisms separately for visual (from K×2048 to M×2048) and textual features at different level of integrations. Additionally, we further improve our proposed approach by combining it with the complementary modules such as the MLB and the Count modules. Experiments on benchmark datasets show that our approach achieves better VQA performance than other state-of-the-art methods.

Keywords:

Computer science Question answering Benchmark (surveying) Artificial intelligence Embedding Focus (optics) Channel (broadcasting) Dimension (graph theory) Bilinear interpolation Process (computing) Machine learning Computer vision Programming language

Metrics

Cited By

0.21

FWCI (Field Weighted Citation Impact)

Refs

0.50

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Multi-Channel Co-Attention Network for Visual Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

MMCN: Multi-Modal Co-attention Network for Medical Visual Question Answering

Dynamic Co-attention Network for Visual Question Answering

Co-Attention Network With Question Type for Visual Question Answering

Co-attention graph convolutional network for visual question answering

Causality guided co-attention network for visual question answering