JOURNAL ARTICLE

Universal Paralinguistic Speech Representations Using self-Supervised Conformers

Joel ShorAren JansenWei HanDaniel ParkYu Zhang

Year: 2022 Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pages: 3169-3173

Abstract

Many speech applications require understanding aspects beyond the words being\nspoken, such as recognizing emotion, detecting whether the speaker is wearing a\nmask, or distinguishing real from synthetic speech. In this work, we introduce\na new state-of-the-art paralinguistic representation derived from large-scale,\nfully self-supervised training of a 600M+ parameter Conformer-based\narchitecture. We benchmark on a diverse set of speech tasks and demonstrate\nthat simple linear classifiers trained on top of our time-averaged\nrepresentation outperform nearly all previous results, in some cases by large\nmargins. Our analyses of context-window size demonstrate that, surprisingly, 2\nsecond context-windows achieve 96\\% the performance of the Conformers that use\nthe full long-term context on 7 out of 9 tasks. Furthermore, while the best\nper-task representations are extracted internally in the network, stable\nperformance across several layers allows a single universal representation to\nreach near optimal performance on all tasks.\n

Keywords:
Paralanguage Computer science Benchmark (surveying) Representation (politics) Context (archaeology) Set (abstract data type) Task (project management) Speech recognition Artificial intelligence Natural language processing Pattern recognition (psychology) Machine learning Communication

Metrics

32
Cited By
3.76
FWCI (Field Weighted Citation Impact)
59
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.