JOURNAL ARTICLE

Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning

Lingwu MengJing WangYang YangLiang Xiao

Year: 2023 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 61 Pages: 1-13   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Remote sensing image captioning aims to generate meaningful and grammatically accurate sentences for remote sensing images. However, in comparison to natural image captioning, remote sensing image captioning encounters additional challenges due to the unique characteristics of remote sensing images. The first challenge arises from the abundance of objects present in these images. As the number of objects increases, it becomes increasingly difficult to determine the main focus of the description. Moreover, the objects in remote sensing images often share similar appearances, which further complicates the generation of accurate descriptions. To overcome these challenges, we propose a Prior Knowledge-guided Transformer for remote sensing image captioning. Firstly, scene-level and object-level features are extracted in a Multi-level Feature Extraction module. To further refine and enhance the extracted multi-level features, we introduce a Feature Enhancement module. This module utilizes a combination of graph neural networks and attention mechanisms to capture the correlation and difference between different objects or scene regions. Moreover, we propose a Prior Knowledge augmented Attention mechanism to select the objects that are more relevant to the scene regions by establishing the relationships between them. This attention mechanism is seamlessly integrated into the Transformer structure, providing valuable prior knowledge that promotes the caption generation process. Extensive experiments on three remote sensing image captioning datasets verify the superiority of the proposed method. Compared with the baseline methods, the proposed method achieves more impressive performance. The code will be publicly available at https://github.com/One-paper-luck/PKG-Transformer.

Keywords:
Closed captioning Computer science Transformer Feature extraction Artificial intelligence Computer vision Feature (linguistics) Remote sensing Image (mathematics)

Metrics

22
Cited By
4.00
FWCI (Field Weighted Citation Impact)
62
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Region-guided transformer for remote sensing image captioning

Kai ZhaoWei Xiong

Journal:   International Journal of Digital Earth Year: 2024 Vol: 17 (1)
BOOK-CHAPTER

Transformer with Prior Language Knowledge for Image Captioning

Daisong YanWenxin YuZhiqiang ZhangJun Gong

Lecture notes in computer science Year: 2021 Pages: 40-51
JOURNAL ARTICLE

Cooperative Connection Transformer for Remote Sensing Image Captioning

Kai ZhaoWei Xiong

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-14
© 2026 ScienceGate Book Chapters — All rights reserved.