JOURNAL ARTICLE

Segmentation Guided Attention Networks for Visual Question Answering

Abstract

In this paper we propose to solve the problem of Visual Question Answering by using a novel segmentation guided attention based network which we call SegAttend-Net.We use image segmentation maps, generated by a Fully Convolutional Deep Neural Network to refine our attention maps and use these refined attention maps to make the model focus on the relevant parts of the image to answer a question.The refined attention maps are used by the LSTM network to learn to produce the answer.We presently train our model on the visual7W dataset and do a category wise evaluation of the 7 question categories.We achieve state of the art results on this dataset and beat the previous benchmark on this dataset by a 1.5% margin improving the question answering accuracy from 54.1% to 55.6% and demonstrate improvements in each of the question categories.We also visualize our generated attention maps and note their improvement over the attention maps generated by the previous best approach.

Keywords:
Question answering Computer science Segmentation Visual attention Artificial intelligence Natural language processing Information retrieval Psychology Neuroscience Cognition

Metrics

9
Cited By
0.89
FWCI (Field Weighted Citation Impact)
24
Refs
0.78
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.