Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Guanrou Yang; Ziyang Ma; Zhisheng Zheng; Yakun Song; Zhikang Niu; Xie Chen

doi:10.1109/asru57964.2023.10389778

ScienceGate Book Chapters

JOURNAL ARTICLE

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Guanrou Yang Ziyang Ma Zhisheng Zheng Yakun Song Zhikang Niu Xie Chen

Year: 2023 Pages: 1-7

DOI: 10.1109/asru57964.2023.10389778

Get Full-Text PDF Get Analytical Report

Abstract

Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potential application and in-depth academic research. To address this issue, we first analyze the computational cost of different modules during HuBERT pre-training and then introduce a stack of efficiency optimizations, which is named Fast-HuBERT in this paper. The proposed Fast-HuBERT can be trained in 1.1 days with 8 V100 GPUs on the Librispeech 960 h benchmark, without performance degradation, resulting in a 5.2x speedup, compared to the original implementation. Moreover, we explore two well-studied techniques in the Fast-HuBERT and demonstrate consistent improvements as reported in previous work. ¹ ¹ The code for Fast-HuBERT training is available at https://github.com/yanghaha0908/FastHuBERT

Keywords:

Computer science Benchmark (surveying) Speedup Artificial intelligence Code (set theory) Representation (politics) Intermediate language Face (sociological concept) Machine learning Parallel computing Programming language

Metrics

Cited By

1.28

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Cross-Corpus Speech Emotion Recognition with HuBERT Self-Supervised Representation

An Adapter Based Pre-Training for Efficient and Scalable Self-Supervised Speech Representation Learning

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Self-supervised Representation Learning for Speech Processing

Phonetically Motivated Self-Supervised Speech Representation Learning