JOURNAL ARTICLE

Caption based Co-attention Architecture for Open-Ended Visual Question Answering

Abstract

Approaches to Visual Question Answering (VQA) revolve around the fusion mechanism to combine the semantic information extracted from the image and the question. The proposed architecture for open-ended VQA architecture has four major components: (1) Vision encoder, (2) Language encoder, (3) Co-attention module, and (4) Answer generator. We explore different combinations of the vision encoder and the language encoder to obtain the representations of the input image and the question. We propose the nonlinear co-attention mechanism and stacked co-attention mechanism to obtain a combined representation of the representations of the image and the question. We also combine the representation of the caption of the input image, with the representations of the image and the question in the caption based stacked nonlinear co-attention mechanism. Results of experimental studies on VQAv2 dataset demonstrate that the open-ended VQA model that uses the caption based stacked nonlinear co-attention module gives an improved performance.

Keywords:
Question answering Computer science Architecture Closed-ended question Visual attention Information retrieval Natural language processing Artificial intelligence Linguistics Psychology History

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
29
Refs
0.21
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Visual question answering algorithm based on image caption

Wenliang CaiGuoyong Qiu

Journal:   2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) Year: 2019 Pages: 2076-2079
JOURNAL ARTICLE

Co-attention Network for Visual Question Answering Based on Dual Attention

Dong FengXiaofeng WangAmmar OadMir Sajjad Hussain Talpur

Journal:   Journal of Engineering Science and Technology Review Year: 2021 Vol: 14 (6)Pages: 116-123
© 2026 ScienceGate Book Chapters — All rights reserved.