JOURNAL ARTICLE

Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

Zhichao PengZeng HuaYongwei LiYegang DuJianwu Dang

Year: 2023 Journal:   Electronics Vol: 12 (22)Pages: 4620-4620   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human–robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker’s emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition.

Keywords:
Speech recognition Valence (chemistry) Computer science Arousal Categorical variable Artificial neural network Artificial intelligence Pattern recognition (psychology) Machine learning Psychology

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
50
Refs
0.20
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
EEG and Brain-Computer Interfaces
Life Sciences →  Neuroscience →  Cognitive Neuroscience
© 2026 ScienceGate Book Chapters — All rights reserved.