JOURNAL ARTICLE

Exploring Self-Supervised Learning for Multi-Modal Remote Sensing Pre-Training via Asymmetric Attention Fusion

Guozheng XuXue JiangXiangtai LiZe ZhangXingzhao Liu

Year: 2023 Journal:   Remote Sensing Vol: 15 (24)Pages: 5682-5682   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Self-supervised learning (SSL) has significantly bridged the gap between supervised and unsupervised learning in computer vision tasks and shown impressive success in the field of remote sensing (RS). However, these methods have primarily focused on single-modal RS data, which may have limitations in capturing the diversity of information in complex scenes. In this paper, we propose the Asymmetric Attention Fusion (AAF) framework to explore the potential of multi-modal representation learning compared to two simpler fusion methods: early fusion and late fusion. Given that data from active sensors (e.g., digital surface models and light detection and ranging) is often noisier and less informative than optical images, the AAF is designed with an asymmetric attention mechanism within a two-stream encoder, applied at each encoder stage. Additionally, we introduce a Transfer Gate module to select more informative features from the fused representations, enhancing performance in downstream tasks. Our comparative analyses on the ISPRS Potsdam datasets, focusing on scene classification and segmentation tasks, demonstrate significant performance enhancements with AAF compared to baseline methods. The proposed approach achieves an improvement of over 7% in all metrics compared to randomly initialized methods for both tasks. Furthermore, when compared to early fusion and late fusion methods, AAF consistently outperforms in achieving superior improvements. These results underscore the effectiveness of AAF in leveraging the strengths of multi-modal RS data for SSL, opening doors for more sophisticated and nuanced RS analysis.

Keywords:
Computer science Fusion Artificial intelligence Modal Encoder Ranging Segmentation Representation (politics) Sensor fusion Pattern recognition (psychology) Machine learning

Metrics

5
Cited By
1.28
FWCI (Field Weighted Citation Impact)
49
Refs
0.80
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Domain Adaptation and Few-Shot Learning
Physical Sciences →  Computer Science →  Artificial Intelligence
Remote-Sensing Image Classification
Physical Sciences →  Engineering →  Media Technology
Advanced Image and Video Retrieval Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.