JOURNAL ARTICLE

A Dual Self-Attention based Network for Image Captioning

Abstract

Image captioning technology has become an important solution for intelligent robots to understand image content. How to extract image information effectively is the key to generate accurate and reliable captions. In this paper, we propose a dual self-attention based network (DSAN) for image captioning. Specifically, we design a Dual Self-Attention Module (DSAM) embedded into an encoding-decoding architecture to capture the contextual information in the image, which can adaptively integrate local features with global dependencies. The DSAM can significantly improve the caption results by modeling rich contextual dependencies over local features. Experimental results on the MS COCO dataset show that the proposed DSAN can achieve better performance than existing methods.

Keywords:
Closed captioning Computer science Dual (grammatical number) Image (mathematics) Encoding (memory) Decoding methods Key (lock) Artificial intelligence Architecture Computer vision Algorithm Computer security

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
48
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Dual-stream Self-attention Network for Image Captioning

Boyang WanWenhui JiangYuming FangWenying WenHantao Liu

Journal:   2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Year: 2022 Pages: 1-5
JOURNAL ARTICLE

Advancing image captioning with V16HP1365 encoder and dual self-attention network

Tarun JaiswalManju PandeyPriyanka Tripathi

Journal:   Multimedia Tools and Applications Year: 2024 Vol: 83 (34)Pages: 80701-80725
BOOK-CHAPTER

Dual Attention for Vietnamese Image Captioning

Anh Cong HoangHoang Long NguyenThi Thuy LeMinh Phong PhanThe Anh PhamDinh Cong Nguyen

Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Year: 2025 Pages: 174-183
© 2026 ScienceGate Book Chapters — All rights reserved.