JOURNAL ARTICLE

High-Resolution Remote Sensing Image Captioning Based on Structured Attention

Rui ZhaoZhenwei ShiZhengxia Zou

Year: 2021 Journal:   IEEE Transactions on Geoscience and Remote Sensing Vol: 60 Pages: 1-14   Publisher: Institute of Electrical and Electronics Engineers

Abstract

Automatically generating language descriptions of remote sensing images has become an emerging research hot spot in the remote sensing field. Attention-based captioning, as a representative group of recent deep learning-based captioning methods, shares the advantage of generating the words while highlighting corresponding object locations in the image. Standard attention-based methods generate captions based on coarse-grained and unstructured attention units, which fails to exploit structured spatial relations of semantic contents in remote sensing images. Although the structure characteristic makes remote sensing images widely divergent to natural images and poses a greater challenge for the remote sensing image captioning task, the key of most remote sensing captioning methods is usually borrowed from the computer vision community without considering the domain knowledge behind. To overcome this problem, a fine-grained, structured attention-based method is proposed to utilize the structural characteristics of semantic contents in high-resolution remote sensing images. Our method learns better descriptions and can generate pixelwise segmentation masks of semantic contents. The segmentation can be jointly trained with the captioning in a unified framework without requiring any pixelwise annotations. Evaluations are conducted on three remote sensing image captioning benchmark data sets with detailed ablation studies and parameter analysis. Compared with the state-of-the-art methods, our method achieves higher captioning accuracy and can generate high-resolution and meaningful segmentation masks of semantic contents at the same time.

Keywords:
Closed captioning Computer science Benchmark (surveying) Artificial intelligence Segmentation Semantics (computer science) Remote sensing Exploit Image segmentation Domain (mathematical analysis) Field (mathematics) Computer vision Image (mathematics) Pattern recognition (psychology)

Metrics

106
Cited By
8.28
FWCI (Field Weighted Citation Impact)
83
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.