Recently, due to the limitations of single sensor, it is hard to improve the performance of land cover classification. The traditional image segmentation methods can not process the optical remote sensing images effectively, especially when optical sensor is affected by complex weather conditions. However, as an active radar, synthetic aperture radar(SAR) has the advantage of not being restricted by weather conditions with the the penetrability of electromagnetic radiation. So multi-sensor data fusion provides a great potential for land cover classification. In this paper, a new fusion network called SegFusion is proposed to improve the performance of land cover classification. There are two main components in SegFusion which are hierarchical Transformer encoder and Swin-Fusion(SW-Fusion) module. First, a hierarchical Transformer encoder is used to extract multilevel feature of optical and SAR images. By integrating features from different layers, we can obtain powerful representation that combines both low-resolution fine-grained features and high-resolution coarse-grained features. Second, SW-Fusion module is used to fuse the features of optical and SAR data. In SW-Fusion, we use modified Swin Transformer [1] block with multi-head cross-attention mechanism to exchange information between features from different sources.
Jiaqi ZhaoYong ZhouBoyu ShiJingsong YangDi ZhangRui Yao
Honglin WuPeng HuangMin ZhangWenlong Tang
Wenhui YeWei ZhangWeimin LeiWenchao ZhangXinyi ChenYanwen WangXinyi ChenYanwen Wang