Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

Yuxiang Zhang; Yang Zhao; Yanni Dong; Bo Du

doi:10.1109/tgrs.2023.3271024

ScienceGate Book Chapters

JOURNAL ARTICLE

Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

Yuxiang Zhang Yang Zhao Yanni Dong Bo Du

Year: 2023 Journal: IEEE Transactions on Geoscience and Remote Sensing Vol: 61 Pages: 1-11 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tgrs.2023.3271024

Get Full-Text PDF Get Analytical Report

Abstract

Self-supervised learning has shown remarkable success in image representation learning. Among these methods, masked image modeling and contrastive learning are the most recent and dominant methods. However, these two approaches will behave differently after being transferred into various downstream tasks. In this paper, we propose a RGB-elevation contrastive and image mask prediction pre-training framework. The elevation is normalized digital surface model. Then we evaluate the learned representation by transferring the pre-trained model into change detection task. To this end, we leverage the recently proposed vision transformer's capability of attending to objects and combine it with the pretext task which is consist of masked image modeling and instance discriminant for fine-tuning the spatial tokens. Besides, the change detection task also requires us to do information interaction between the two temporal remote sensing images. To counter this problem, we propose a plug-in temporal fusion module based on masked cross attention and then we evaluate its effectiveness in three open change detection datasets in terms of initializing the supervised training weights. Our method achieves improvements in comparison to supervised learning methods and two mainstream self-supervised learning methods MoCo and DINO on change detection task. The results of our experiment also achieve state-of-the-art in four change detection datasets. The code will be available at URL.

Keywords:

Multimodality Computer science Change detection Artificial intelligence Transformer Computer vision Pattern recognition (psychology) Remote sensing Engineering Geology Voltage

Metrics

Cited By

7.16

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Remote-Sensing Image Classification

Physical Sciences → Engineering → Media Technology

Image Processing Techniques and Applications

Physical Sciences → Engineering → Media Technology

Image Retrieval and Classification Techniques

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection

Abstract

Metrics

Citation History

Topics

Related Documents

Hybrid Transformer Network for Change Detection Under Self-Supervised Pretraining

Self-Supervised Pretraining With Multimodality Representation Enhancement for Salient Object Detection in RGB-D Images

Self-Supervised Pretraining Improves Self-Supervised Pretraining

Self-Supervised Pretraining Transformer for Seismic Data Denoising

Pyramid Vision Transformer-based COVID-19 Detection Using Self-supervised Learning and Pretraining on Chest X-ray Images