Yanheng WangDanfeng HongJianjun ShaLianru GaoLian LiuYonggang ZhangXianhui Rong
Convolutional neural networks (CNNs) with excellent spatial feature extraction abilities have become popular in remote sensing (RS) image change detection (CD). However, CNNs often focus on the extraction of spatial information but ignore important spectral and temporal sequences for hyperspectral images (HSIs). In this paper, we propose a joint spectral, spatial, and temporal transformer for hyperspectral image change detection (HSI-CD), named SST-Former. First, the SST-Former position-encodes each pixel on the cube to remember the spectral and spatial sequences. Second, a spectral transformer encoder structure is used to extract spectral sequence information. Then, a class token for storing the class information of a single temporal HSI concatenates the output of the spectral transformer encoder. The spatial transformer encoder is used to extract spatial texture information in the next step. Finally, the features of different temporal HSIs are sent as the input of temporal transformer, which is used to extract useful CD features between the current HSI pairs and obtain the binary CD result through multilayer perception (MLP). We evaluate SST-Former on three HSI-CD datasets by numerous experiments, showing that it performs better than other excellent methods both visually and qualitatively.
Yaxiong ChenZhipeng ZhangLe DongShengwu XiongXiaoqiang Lu
Haoyang YuHao YangLianru GaoJiaochan HuAntonio PlazaBing Zhang