Junjie ZhangZhe MengFeng ZhaoHanqiang LiuZhenhui Chang
Hyperspectral image (HSI) can provide rich spectral information which can be helpful for accurate classification in many applications. Yet, incorporating spatial information in the classification process can improve the classification accuracy even further. Existing convolutional neural network (CNN) usually only focuses on local features in hyperspectral cubes, whereas the burgeoning vision transformer (ViT) is interested in global features in HSIs. In this letter, we propose a deep aggregated framework for HSI classification called convolution transformer mixer (CTMixer) to combine the advantages of the above two paradigms effectively. A group parallel residual block is firstly applied to capture local spectral-spatial features in the HSI patches. Secondly, a double-branch structure, consisting of the CNN and transformer branches, is developed to capture local-global hyperspectral features. Finally, to achieve an elegant combination of CNN and ViT, a novel local-global multi-head self-attention mechanism is proposed by introducing convolution operations in the multi-head self-attention mechanism to further improve the classification accuracy. Extensive experiments demonstrate that the CTMixer achieves competitive classification results on several common HSI datasets compared with other state-of-the-art networks. The source code for this work will be available at https://github.com/ZJier/CTMixer.
Zihan ChenMiaomiao LiangWeiwei WuSiyu YangZhe MengLingjuan Yu
Tahir ArshadJunping ZhangInam Ullah
Wei LiuSaurabh PrasadMelba M. Crawford
Feng ZhaoShijie LiJunjie ZhangHanqiang Liu
Jiaju LiHanfa XingZurui AoHefeng WangWenkai LiuAnbing Zhang