JOURNAL ARTICLE

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Abstract

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960 h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work. 1 1 The code for Fast-HuBERT training is available at https://github.com/yanghaha0908/FastHuBERT

Keywords:
Computer science Benchmark (surveying) Speedup Artificial intelligence Code (set theory) Representation (politics) Intermediate language Face (sociological concept) Machine learning Parallel computing Programming language

Metrics

5
Cited By
1.28
FWCI (Field Weighted Citation Impact)
30
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.