JOURNAL ARTICLE

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Abstract

We present a deep convolutional recurrent neural network for speech emotion recognition based on the log-Mel filterbank energies, where the convolutional layers are responsible for the discriminative feature learning. Based on the hypothesis that a better understanding of the internal configuration within an utterance would help reduce misclassification, we further propose a convolutional attention mechanism to learn the utterance structure relevant to the task. In addition, we quantitatively measure the performance gain contributed by each module in our model in order to characterize the nature of emotion expressed in speech. The experimental results on the eNTERFACE'05 emotion database validate our hypothesis and also demonstrate an absolute improvement by 4.62% compared to the state-of-the-art approach.

Keywords:
Computer science Convolutional neural network Discriminative model Speech recognition Utterance Artificial intelligence Feature (linguistics) Recurrent neural network Emotion recognition Task (project management) Deep learning Mechanism (biology) Feature extraction Pattern recognition (psychology) Artificial neural network

Metrics

124
Cited By
14.97
FWCI (Field Weighted Citation Impact)
31
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.