JOURNAL ARTICLE

Advancing Speech Emotion Recognition with Interpretable Neural Networks and Self-Supervised Paralinguistic Representations

Abstract

This research focuses on novel approaches for speech-based emotion recognition (SER). SER technologies can be applied in various contexts, such as assessing customer satisfaction in call centers, tracking personal moods, and monitoring emotions in healthcare settings. Numerous machine learning methods have been proposed, ranging from traditional feature-based models to end-to-end interpretable neural networks and self-supervised learning techniques. These methods have produced explainable representations and identifiable features related to vocal cues, which we refer to as paralinguistic representations (i.e., beyond linguistics). By incorporating a pre-trained paralinguistic representation, our method achieved accuracy comparable to state-of-the-art techniques while maintaining high efficiency. A detailed analysis of errors and metadata indicated that our proposed method reduces gender bias and generalizes well to unseen speakers and spontaneous emotions, extending beyond recordings of scripted utterances.

Keywords:
Paralanguage Artificial neural network Metadata Emotion recognition Representation (politics) Emotion detection Deep learning

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.56
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Digital Filter Design and Implementation
Physical Sciences →  Computer Science →  Signal Processing
Blind Source Separation Techniques
Physical Sciences →  Computer Science →  Signal Processing
Numerical Methods and Algorithms
Physical Sciences →  Computer Science →  Computational Theory and Mathematics
© 2026 ScienceGate Book Chapters — All rights reserved.