Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation

Po-Han Chi; Pei-Hung Chung; Tsung-Han Wu; Chun-Cheng Hsieh; Yen‐Hao Chen; Shang-Wen Li; Hung-yi Lee

doi:10.1109/slt48900.2021.9383575

ScienceGate Book Chapters

JOURNAL ARTICLE

Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation

Po-Han Chi Pei-Hung Chung Tsung-Han Wu Chun-Cheng Hsieh Yen‐Hao Chen Shang-Wen Li Hung-yi Lee

Year: 2021 Pages: 344-350

DOI: 10.1109/slt48900.2021.9383575

Get Full-Text PDF Get Analytical Report

Abstract

Self-supervised speech models are powerful speech representation extractors for downstream applications. Recently, larger models have been utilized in acoustic model training to achieve better performance. We propose Audio ALBERT, a lite version of the self-supervised speech representation model. We apply the lightweight representation extractor to two downstream tasks, speaker classification and phoneme classification. We show that Audio ALBERT achieves performance comparable with massive pre-trained networks in the downstream tasks while having 91% fewer parameters. Moreover, we design probing models to measure how much the latent representations can encode the speaker's and phoneme's information. We find that the representations encoded in internal layers of Audio ALBERT contain more information for both phoneme and speaker than the last layer, which is generally used for downstream tasks. Our findings provide a new avenue for using self-supervised networks to achieve better performance and efficiency.

Keywords:

Computer science Speech recognition Representation (politics) Extractor Downstream (manufacturing) Feature learning ENCODE Artificial intelligence Acoustic model Layer (electronics) Speech processing Natural language processing

Metrics

157

Cited By

20.60

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation

Abstract

Metrics

Citation History

Topics

Related Documents

Audio Barlow Twins: Self-Supervised Audio Representation Learning

Audio–visual self-supervised representation learning: A survey

Audio DistilBERT: A Distilled Audio BERT for Speech Representation Learning

Comparing Learning Methodologies for Self-Supervised Audio-Visual Representation Learning

Audio-Visual Predictive Coding for Self-Supervised Visual Representation Learning