Segmentation Guided Attention Networks for Visual Question Answering

Vasu Sharma; Ankita Bishnu; Labhesh Patel

doi:10.18653/v1/p17-3008

ScienceGate Book Chapters

JOURNAL ARTICLE

Segmentation Guided Attention Networks for Visual Question Answering

Vasu Sharma Ankita Bishnu Labhesh Patel

Year: 2017

DOI: 10.18653/v1/p17-3008

Get Full-Text PDF Get Analytical Report

Abstract

In this paper we propose to solve the problem of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttend-Net.We use image segmentation maps, generated by a Fully Convolutional Deep Neural Network to refine our attention maps and use these refined attention maps to make the model focus on the relevant parts of the image to answer a question.The refined attention maps are used by the LSTM network to learn to produce the answer.We presently train our model on the visual7W dataset and do a category wise evaluation of the 7 question categories.We achieve state of the art results on this dataset and beat the previous benchmark on this dataset by a 1.5% margin improving the question answering accuracy from 54.1% to 55.6% and demonstrate improvements in each of the question categories.We also visualize our generated attention maps and note their improvement over the attention maps generated by the previous best approach.

Keywords:

Question answering Computer science Segmentation Visual attention Artificial intelligence Natural language processing Information retrieval Psychology Neuroscience Cognition

Metrics

Cited By

0.89

FWCI (Field Weighted Citation Impact)

Refs

0.78

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Segmentation Guided Attention Networks for Visual Question Answering

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal Cross-guided Attention Networks for Visual Question Answering

Question Type Guided Attention in Visual Question Answering

Segmentation-Guided Attention for Visual Question Answering from Remote Sensing Images

Multimodal Bi-direction Guided Attention Networks for Visual Question Answering

Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering