Universal Paralinguistic Speech Representations Using self-Supervised Conformers

Joel Shor; Aren Jansen; Wei Han; Daniel Park; Yu Zhang

doi:10.1109/icassp43922.2022.9747197

ScienceGate Book Chapters

JOURNAL ARTICLE

Universal Paralinguistic Speech Representations Using self-Supervised Conformers

Joel Shor Aren Jansen Wei Han Daniel Park Yu Zhang

Year: 2022 Journal: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages: 3169-3173

DOI: 10.1109/icassp43922.2022.9747197

Get Full-Text PDF Get Analytical Report

Abstract

Many speech applications require understanding aspects beyond the words being\nspoken, such as recognizing emotion, detecting whether the speaker is wearing a\nmask, or distinguishing real from synthetic speech. In this work, we introduce\na new state-of-the-art paralinguistic representation derived from large-scale,\nfully self-supervised training of a 600M+ parameter Conformer-based\narchitecture. We benchmark on a diverse set of speech tasks and demonstrate\nthat simple linear classifiers trained on top of our time-averaged\nrepresentation outperform nearly all previous results, in some cases by large\nmargins. Our analyses of context-window size demonstrate that, surprisingly, 2\nsecond context-windows achieve 96\\% the performance of the Conformers that use\nthe full long-term context on 7 out of 9 tasks. Furthermore, while the best\nper-task representations are extracted internally in the network, stable\nperformance across several layers allows a single universal representation to\nreach near optimal performance on all tasks.\n

Keywords:

Paralanguage Computer science Benchmark (surveying) Representation (politics) Context (archaeology) Set (abstract data type) Task (project management) Speech recognition Artificial intelligence Natural language processing Pattern recognition (psychology) Machine learning Communication

Metrics

Cited By

3.76

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Universal Paralinguistic Speech Representations Using self-Supervised Conformers

Abstract

Metrics

Citation History

Topics

Related Documents

TRILLsson: Distilled Universal Paralinguistic Speech Representations

Advancing Speech Emotion Recognition with Interpretable Neural Networks and Self-Supervised Paralinguistic Representations

Advancing Speech Emotion Recognition with Interpretable Neural Networks and Self-Supervised Paralinguistic Representations

Mispronunciation detection using self-supervised speech representations

Similarity Analysis of Self-Supervised Speech Representations