JOURNAL ARTICLE

Dual Attention on Pyramid Feature Maps for Image Captioning

Litao YuJian ZhangQiang Wu

Year: 2021 Journal:   IEEE Transactions on Multimedia Vol: 24 Pages: 1775-1786   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Generating natural sentences from images is a fundamental learning task for\nvisual-semantic understanding in multimedia. In this paper, we propose to apply\ndual attention on pyramid image feature maps to fully explore the\nvisual-semantic correlations and improve the quality of generated sentences.\nSpecifically, with the full consideration of the contextual information\nprovided by the hidden state of the RNN controller, the pyramid attention can\nbetter localize the visually indicative and semantically consistent regions in\nimages. On the other hand, the contextual information can help re-calibrate the\nimportance of feature components by learning the channel-wise dependencies, to\nimprove the discriminative power of visual features for better content\ndescription. We conducted comprehensive experiments on three well-known\ndatasets: Flickr8K, Flickr30K and MS COCO, which achieved impressive results in\ngenerating descriptive and smooth natural sentences from images. Using either\nconvolution visual features or more informative bottom-up attention features,\nour composite captioning model achieves very promising performance in a\nsingle-model mode. The proposed pyramid attention and dual attention methods\nare highly modular, which can be inserted into various image captioning modules\nto further improve the performance.\n

Keywords:

Metrics

51
Cited By
3.99
FWCI (Field Weighted Citation Impact)
58
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Auxiliary feature extractor and dual attention-based image captioning

Qian ZhaoGuichang Wu

Journal:   Signal Image and Video Processing Year: 2024 Vol: 18 (4)Pages: 3615-3626
JOURNAL ARTICLE

Dual attention based feature pyramid network

Huijun XingShuai WangDezhi ZhengXiaotong Zhao

Journal:   China Communications Year: 2020 Vol: 17 (8)Pages: 242-252
BOOK-CHAPTER

Dual Attention for Vietnamese Image Captioning

Anh Cong HoangHoang Long NguyenThi Thuy LeMinh Phong PhanThe Anh PhamDinh Cong Nguyen

Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering Year: 2025 Pages: 174-183
JOURNAL ARTICLE

Improving image captioning with Pyramid Attention and SC-GAN

Tianyu ChenZhixin LiJingli WuHuifang MaBianping Su

Journal:   Image and Vision Computing Year: 2021 Vol: 117 Pages: 104340-104340
© 2026 ScienceGate Book Chapters — All rights reserved.