JOURNAL ARTICLE

Hierarchical Attention Network for Image Captioning

Weixuan WangZhihong ChenHaifeng Hu

Year: 2019 Journal:   Proceedings of the AAAI Conference on Artificial Intelligence Vol: 33 (01)Pages: 8957-8964   Publisher: Association for the Advancement of Artificial Intelligence

Abstract

Recently, attention mechanism has been successfully applied in image captioning, but the existing attention methods are only established on low-level spatial features or high-level text features, which limits richness of captions. In this paper, we propose a Hierarchical Attention Network (HAN) that enables attention to be calculated on pyramidal hierarchy of features synchronously. The pyramidal hierarchy consists of features on diverse semantic levels, which allows predicting different words according to different features. On the other hand, due to the different modalities of features, a Multivariate Residual Module (MRM) is proposed to learn the joint representations from features. The MRM is able to model projections and extract relevant relations among different features. Furthermore, we introduce a context gate to balance the contribution of different features. Compared with the existing methods, our approach applies hierarchical features and exploits several multimodal integration strategies, which can significantly improve the performance. The HAN is verified on benchmark MSCOCO dataset, and the experimental results indicate that our model outperforms the state-of-the-art methods, achieving a BLEU1 score of 80.9 and a CIDEr score of 121.7 in the Karpathy’s test split.

Keywords:
Closed captioning Computer science Benchmark (surveying) Context (archaeology) Hierarchy Artificial intelligence Residual Exploit Attention network Generalization Pattern recognition (psychology) Image (mathematics) Machine learning Data mining Algorithm

Metrics

126
Cited By
9.17
FWCI (Field Weighted Citation Impact)
43
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.