JOURNAL ARTICLE

SR-HuBERT : An Efficient Pre-Trained Model for Speaker Verification

Abstract

Recently, pre-trained models (PTMs) have been extensively applied in speaker verification (SV) and greatly boosted system performance. However, mainstream PTMs currently concentrate on using frame-level universal representations. In this paper, we propose a novel pre-training framework that jointly models speaker information — Speaker Related HuBERT, abbreviated as SR-HuBERT. This framework aims to further explore speaker-related information inherent in speech universal representations. The proposed SR-HuBERT utilizes an unsupervised clustering algorithm based on graph structures to generate speaker pseudo-labels and promotes the learning of segment-level speaker-related representations through a multi-task pre-training framework. Experimental results on VoxCeleb1 test set demonstrate the effectiveness of the proposed SR-HuBERT. Even in the scenarios of limited fine-tuning data, SR-HuBERT outperforms the other existing PTMs on SV tasks. Additionally, SR-HuBERT also performs well on speaker-related tasks of SUPERB benchmark.

Keywords:
Computer science Benchmark (surveying) Cluster analysis Task (project management) Speaker recognition Speaker verification Speaker diarisation Frame (networking) Artificial intelligence Speech recognition Set (abstract data type) Test set Graph Training set Natural language processing Machine learning Theoretical computer science Programming language

Metrics

5
Cited By
3.19
FWCI (Field Weighted Citation Impact)
28
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.