Xiaohui LiChenyang JinYuntao TangKai XingXiaodong Yu
Cross-scene classification of hyperspectral images poses significant challenges due to the lack of a priori knowledge and the differences in data distribution across scenes. While traditional studies have had limited use of a priori knowledge from other modalities, recent advancements in pre-trained large-scale language-vision models have shown strong performance on various downstream tasks, highlighting the potential of cross-modal assisted learning. In this paper, we propose a Semantic-aware Collaborative Parallel Network (SCPNet) to mitigate the impact of data distribution differences by incorporating linguistic modalities to assist in learning cross-domain invariant representations of hyperspectral images. SCPNet uses a parallel architecture consisting of a spatial–spectral feature extraction module and a multiscale feature extraction module, designed to capture rich image information during the feature extraction phase. The extracted features are then mapped into an optimized semantic space, where improved supervised contrastive learning clusters image features from the same category together while separating those from different categories. Semantic space bridges the gap between visual and linguistic modalities, enabling the model to mine cross-domain invariant representations from the linguistic modality. Experimental results demonstrate that SCPNet significantly outperforms existing methods on three publicly available datasets, confirming its effectiveness for cross-scene hyperspectral image classification tasks.
Yuxiang ZhangMengmeng ZhangWei LiShuai WangRan Tao
Wenzhen WangFang LiuHongyuan ZhuLiang Xiao
Jingpeng GaoXiangyu JiFang YeGeng Chen
Yuxiang ZhangWei LiMengmeng ZhangYing QuRan TaoHairong Qi
Kexin DingTing LuYuxin FangShutao Li