JOURNAL ARTICLE

Exploring Multi-Level Attention and Semantic Relationship for Remote Sensing Image Captioning

Zhenghang YuanXuelong LiQi Wang

Year: 2019 Journal:   IEEE Access Vol: 8 Pages: 2608-2620   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Remote sensing image captioning, which aims to understand high-level semantic information and interactions of different ground objects, is a new emerging research topic in recent years. Though image captioning has developed rapidly with convolutional neural networks (CNNs) and recurrent neural networks (RNNs), the image captioning task for remote sensing images still suffers from two main limitations. One limitation is that the scales of objects in remote sensing images vary dramatically, which makes it difficult to obtain an effective image representation. Another limitation is that the visual relationship in remote sensing images is still underused, which should have great potential to improve the final performance. In order to deal with these two limitations, an effective framework for captioning the remote sensing image is proposed in this paper. The framework is based on multi-level attention and multi-label attribute graph convolution. Specifically, the proposed multi-level attention module can adaptively focus not only on specific spatial features, but also on features of specific scales. Moreover, the designed attribute graph convolution module can employ the attribute-graph to learn more effective attribute features for image captioning. Extensive experiments are conducted and the proposed method achieves superior performance on UCM-captions, Sydney-captions and RSICD dataset.

Keywords:
Closed captioning Computer science Convolutional neural network Artificial intelligence Graph Focus (optics) Image (mathematics) Convolution (computer science) Pattern recognition (psychology) Computer vision Artificial neural network Theoretical computer science

Metrics

52
Cited By
2.67
FWCI (Field Weighted Citation Impact)
64
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.