JOURNAL ARTICLE

Hierarchical Question-Image Co-Attention for Visual Question Answering

Jiasen Lu

Year: 2024 Journal:   TIB Data Manager Vol: 29 Pages: 289-297

Abstract

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant answering the question. In this paper, we argue that in addition modeling where look or visual attention, it is equally important model what words listen to or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN). Our model improves the state-of-the-art on the VQA dataset from 60.3% 60.5%, and from 61.6% 63.3% on the COCO-QA dataset. By using ResNet, the performance is further improved 62.1% for VQA and 65.4% for COCO-QA.

Keywords:
Question answering Computer science Image (mathematics) Convolution (computer science) Artificial intelligence Convolutional neural network Visual attention Pattern recognition (psychology) Machine learning Artificial neural network Perception Psychology

Metrics

438
Cited By
0.00
FWCI (Field Weighted Citation Impact)
27
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Question-oriented cross-modal co-attention networks for visual question answering

Wei GuanZhenyu WuPing Wen

Journal:   2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE) Year: 2022 Pages: 401-407
JOURNAL ARTICLE

Hierarchical Attention Networks for Fact-based Visual Question Answering

Haibo YaoYongkang LuoZhi ZhangJianhang YangChengtao Cai

Journal:   Multimedia Tools and Applications Year: 2023 Vol: 83 (6)Pages: 17281-17298
© 2026 ScienceGate Book Chapters — All rights reserved.