Wei HuangDazhan ZhouLe SunQiqiang ChenJunru Yin
Significant progress has been achieved in hyperspectral image (HSI) classification research through the application of the transformer blocks. Despite transformers possess strong long-range dependence modeling capabilities, they primarily extract nonlocal information from patches and often fail to fully capture global information, leading to incomplete spectral-spatial feature extraction. However, graph convolutional networks (GCNs) can effectively extract features from the global structure. This article proposes an adaptive pixel-level and superpixel-level feature fusion transformer (APSFFT). The network comprises two branches: one is the convolutional neural networks (CNNs) and transformer networks (CNTN), and the other is the GCNs and transformer networks (GNTN). These branches are designed to extract pixel-level and superpixel-level feature information from HSI, respectively. CNTN leverages the strengths of CNNs in extracting spectral–spatial information, combined with the transformer network's ability to establish long-range dependencies based on self-attention (SA). The GNTN fully extracts superpixel-level features while establishing long-range dependencies. To adaptively fuse the features from these two branches, an adaptive cross-token attention fusion (ACTAF) encoder is utilized. The ACTAF encoder fuses the classification token from both branches through SA, thereby enhancing the model's ability to capture interactions between pixel-level and superpixel-level features. We compared and analyzed seven advanced HSI classification algorithms, and experiments showed that APSFFT outperforms other state-of-the-art methods.
Shutao LiTing LuLeyuan FangXiuping JiaJón Atli Benediktsson
Obed Tettey NarteyKwabena SarpongDaniel AddoYunbo RaoZhiguang Qin
Qichao LiuLiang XiaoJingxiang YangZhihui Wei
Jie MeiYuebin WangLiqiang ZhangBing ZhangSuhong LiuPanpan ZhuYingchao Ren