High-Resolution Remote Sensing Image Captioning Based on Structured Attention

Rui Zhao; Zhenwei Shi; Zhengxia Zou

doi:10.1109/tgrs.2021.3070383

ScienceGate Book Chapters

JOURNAL ARTICLE

High-Resolution Remote Sensing Image Captioning Based on Structured Attention

Rui Zhao Zhenwei Shi Zhengxia Zou

Year: 2021 Journal: IEEE Transactions on Geoscience and Remote Sensing Vol: 60 Pages: 1-14 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tgrs.2021.3070383

Get Full-Text PDF Get Analytical Report

Abstract

Automatically generating language descriptions of remote sensing images has become an emerging research hot spot in the remote sensing field. Attention-based captioning, as a representative group of recent deep learning-based captioning methods, shares the advantage of generating the words while highlighting corresponding object locations in the image. Standard attention-based methods generate captions based on coarse-grained and unstructured attention units, which fails to exploit structured spatial relations of semantic contents in remote sensing images. Although the structure characteristic makes remote sensing images widely divergent to natural images and poses a greater challenge for the remote sensing image captioning task, the key of most remote sensing captioning methods is usually borrowed from the computer vision community without considering the domain knowledge behind. To overcome this problem, a fine-grained, structured attention-based method is proposed to utilize the structural characteristics of semantic contents in high-resolution remote sensing images. Our method learns better descriptions and can generate pixelwise segmentation masks of semantic contents. The segmentation can be jointly trained with the captioning in a unified framework without requiring any pixelwise annotations. Evaluations are conducted on three remote sensing image captioning benchmark data sets with detailed ablation studies and parameter analysis. Compared with the state-of-the-art methods, our method achieves higher captioning accuracy and can generate high-resolution and meaningful segmentation masks of semantic contents at the same time.

Keywords:

Closed captioning Computer science Benchmark (surveying) Artificial intelligence Segmentation Semantics (computer science) Remote sensing Exploit Image segmentation Domain (mathematical analysis) Field (mathematics) Computer vision Image (mathematics) Pattern recognition (psychology)

Metrics

106

Cited By

8.28

FWCI (Field Weighted Citation Impact)

Refs

0.98

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

High-Resolution Remote Sensing Image Captioning Based on Structured Attention

Abstract

Metrics

Citation History

Topics

Related Documents

High-Resolution Remote Sensing Image Captioning Based on Structured Attention and SAM Network

High-Resolution Remote Sensing Image Captioning Based on Structured Attention and SAM Network

Remote Sensing Based Advance Image Captioning Improved Feature Attention

Remote Sensing Image Captioning via Multilevel Attention-Based Visual Question Answering

Sound Active Attention Framework for Remote Sensing Image Captioning