Miaomiao LiangZuo LiuJian DongLingjuan YuXiangchun YuJun LiLicheng Jiao
Hyperspectral images provide plentiful latent information that requires exploration for ground object recognition, where self-supervised learning is efficient and independent of manual labeling. However, the severe spectral uncertainty poses a significant challenge in discriminative and generalizable representation by self-supervision. This letter proposes a variational generative transformer with momentum contrastive supervision (ConVaT) to alleviate the problem. ConVaT contains two branches: a variational generative branch and a contrastive learning branch—the former guides informative data representation via an encoder-decoder transformer with variational inference; the latter encourages the representation with discriminability by distinguishing positive anchors from negative ones. Significantly, to facilitate a more generalizable latent representation, we reconstruct data with reparameterized tokens sampled multiple times from the global anchor, instead of the latent representation of unmasking data. Extensive experiments on three public datasets show that ConVaT is superior in data representation with intra-class clustering and inter-class distinction, and it achieves considerable improvements over present methods under linear probing, especially for the Indian pines dataset with intense spectral uncertainty. Our code will be available at https://github.com/liuzuo-byte/ConVaT.
Heng ZhouXin ZhangChunlei ZhangQiaoyu Ma
Xiang HuTeng LiTong ZhouYü LiuYuanxi Peng
Jie FengZizhuo GaoRonghua ShangXiangrong ZhangLicheng Jiao
Siyuan HaoYufeng XiaYuanxin Ye
Bobo XiYun ZhangJiaojiao LiTie ZhengXunfeng ZhaoHaitao XuChangbin XueYunsong LiJocelyn Chanussot