JOURNAL ARTICLE

Region-guided transformer for remote sensing image captioning

Kai ZhaoWei Xiong

Year: 2024 Journal:   International Journal of Digital Earth Vol: 17 (1)   Publisher: Taylor & Francis

Abstract

Remote sensing image acquisition is an essential way to obtain information. However, research on remote sensing images mainly focuses on object detection or image classification. The emergence of remote sensing image captioning (RSIC) has enabled understanding and inference of remote sensing images, thus attracting considerable attention. There are still challenges in RSIC: the features used in RSIC are mostly based on grid features, and this form of features makes it difficult for the model to determine the main description targets. Hence, a more effective cross-modal matching method is needed for better text generation. Thus, we propose a region-guided transformer in response to the aforementioned issues. We extracted region features to enhance the ability of the model to focus on the main targets. To address the issue of information loss caused by region feature extraction, we proposed environment features to supplement background information. To improve the matching between text and image features, we propose a region-guided decoder that enhances the model's perception of different features through a weighted cross-attention mechanism. Meanwhile, we introduce region-guided information to guide the text-generation process. The effectiveness and superiority of our model have been demonstrated through extensive experiments.

Keywords:
Closed captioning Remote sensing Transformer Computer science Geography Image (mathematics) Computer vision Artificial intelligence Cartography Engineering Electrical engineering

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
43
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Image Retrieval and Classification Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Prior Knowledge-Guided Transformer for Remote Sensing Image Captioning

Lingwu MengJing WangYang YangLiang Xiao

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2023 Vol: 61 Pages: 1-13
JOURNAL ARTICLE

Region Driven Remote Sensing Image Captioning

S Chandeesh KumarM. HemalathaShivangi NarayanP Nandhini

Journal:   Procedia Computer Science Year: 2019 Vol: 165 Pages: 32-40
JOURNAL ARTICLE

Cooperative Connection Transformer for Remote Sensing Image Captioning

Kai ZhaoWei Xiong

Journal:   IEEE Transactions on Geoscience and Remote Sensing Year: 2024 Vol: 62 Pages: 1-14
© 2026 ScienceGate Book Chapters — All rights reserved.