JOURNAL ARTICLE

Adaptively Aligned Image Captioning via Adaptive Attention Time

Lun HuangWenmin WangYaxian XiaJie Chen

Year: 2019 Journal:   arXiv (Cornell University) Vol: 32 Pages: 8940-8949   Publisher: Cornell University

Abstract

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning. Code is available at https://github.com/husthuaan/AAT.

Keywords:
Closed captioning Computer science Word (group theory) Image (mathematics) Decoding methods Encoder Feature (linguistics) Source code Artificial intelligence Code (set theory) Noise (video) Computer vision Speech recognition Algorithm Programming language Mathematics Linguistics

Metrics

39
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Neural Network Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Attention-Aligned Transformer for Image Captioning

Zhengcong Fei

Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Year: 2022 Vol: 36 (1)Pages: 607-615
JOURNAL ARTICLE

Task-Adaptive Attention for Image Captioning

Chenggang YanYiming HaoLiang LiJian YinAn-An LiuZhendong MaoZhenyu ChenXingyu Gao

Journal:   IEEE Transactions on Circuits and Systems for Video Technology Year: 2021 Vol: 32 (1)Pages: 43-51
JOURNAL ARTICLE

Adaptive Syncretic Attention for Constrained Image Captioning

Liang YangHaifeng Hu

Journal:   Neural Processing Letters Year: 2019 Vol: 50 (1)Pages: 549-564
JOURNAL ARTICLE

Image captioning with adaptive incremental global context attention

Changzhi WangXiaodong Gu

Journal:   Applied Intelligence Year: 2021 Vol: 52 (6)Pages: 6575-6597
© 2026 ScienceGate Book Chapters — All rights reserved.