Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning

Chen Cai; Yi Wang; Kim–Hui Yap

doi:10.3390/rs15235611

ScienceGate Book Chapters

JOURNAL ARTICLE

Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning

Chen Cai Yi Wang Kim–Hui Yap

Year: 2023 Journal: Remote Sensing Vol: 15 (23)Pages: 5611-5611 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/rs15235611

Get Full-Text PDF Get Analytical Report

Abstract

Remote sensing image change captioning (RSICC) aims to automatically generate sentences describing the difference in content in remote sensing bitemporal images. Recent works extract the changes between bitemporal features and employ a hierarchical approach to fuse multiple changes of interest, yielding change captions. However, these methods directly aggregate all features, potentially incorporating non-change-focused information from each encoder layer into the change caption decoder, adversely affecting the performance of change captioning. To address this problem, we proposed an Interactive Change-Aware Transformer Network (ICT-Net). ICT-Net is able to extract and incorporate the most critical changes of interest in each encoder layer to improve change description generation. It initially extracts bitemporal visual features from the CNN backbone and employs an Interactive Change-Aware Encoder (ICE) to capture the crucial difference between these features. Specifically, the ICE captures the most change-aware discriminative information between the paired bitemporal features interactively through difference and content attention encoding. A Multi-Layer Adaptive Fusion (MAF) module is proposed to adaptively aggregate the relevant change-aware features in the ICE layers while minimizing the impact of irrelevant visual features. Moreover, we extend the ICE to extract multi-scale changes and introduce a novel Cross Gated-Attention (CGA) module into the change caption decoder to select essential discriminative multi-scale features to improve the change captioning performance. We evaluate our method on two RSICC datasets (e.g., LEVIR-CC and LEVIRCCD), and the experimental results demonstrate that our method achieves a state-of-the-art performance.

Keywords:

Closed captioning Computer science Discriminative model Encoder Transformer Change detection Artificial intelligence Image (mathematics)

Metrics

Cited By

4.73

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Multimodal Machine Learning Applications

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Advanced Image and Video Retrieval Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Domain Adaptation and Few-Shot Learning

Physical Sciences → Computer Science → Artificial Intelligence

Interactive Change-Aware Transformer Network for Remote Sensing Image Change Captioning

Abstract

Metrics

Citation History

Topics

Related Documents

Progressive Scale-Aware Network for Remote Sensing Image Change Captioning

Interactive Concept Network Enhanced Transformer for Remote Sensing Image Captioning

Lightweight Structure-Aware Transformer Network for Remote Sensing Image Change Detection

MTH-Net: A Mamba–Transformer Hybrid Network for Remote Sensing Image Change Captioning

Multi-scale Change-Aware Transformer for Remote Sensing Image Change Detection