Recently, the fine-tuning pre-trained model framework has emerged as a promising paradigm for speech-processing tasks. In this study, we present a novel strategy for unsupervised speaker verification using the Sub-structure of Pre-Trained Model (Sub-PTM), which consists of a CNN-based feature extractor and several Transformer blocks. To obtain the initial pseudo labels, we utilize Infomap to perform clustering on the representations extracted from the Sub-PTM. The generated pseudo labels are then leveraged to train a speaker verification model containing a Sub-PTM and a downstream network. We also propose an Online and Offline Label Correction (OAO-LC) method to alleviate the effects of incorrect pseudo labels. By incorporating these techniques, our system achieves competitive results compared to the supervised baseline.
Siqi ZhengHongbin SuoQian Chen
Yishuang LiHukai HuangZhicong ChenWenhao GuanJiayan LinLin LiQingyang Hong
Chan-yeong LimHyun-seo ShinJu-ho KimJungwoo HeoKyo-Won KooSeung-bin KimHa-Jin Yu
Bing HanZhengyang ChenYanmin Qian