JOURNAL ARTICLE

Multiple attention convolutional-recurrent neural networks for speech emotion recognition

Abstract

Speech Emotion Recognition is of great significance in the research field of human-computer interaction and affective computing. One of the major challenges for SER now lies in how to explore effective emotional features from lengthy utterances. However, since most of existing deep-learning based SERs adopt Log-Mel spectrograms as the input model, it is unable to fully convey the emotional information in the speech. Furthermore, limited extraction ability of the model may make it difficult to extract key emotional representations. As a result, in order to address the above issues, we propose a new convolutional recurrent network based on multiple attention, including convolutional neural network (CNN) and bidirectional long short-term memory network (BiLSTM) modules, using extracted Mel-spectrums and Fourier Coefficient features respectively, which helps to complement the emotional information. Further, the multiple attention mechanisms in our model are as follows: Spatial attention and channel attention mechanisms are added to the CNN module to focus on the key emotional area and locate more effective features. Temporal attention gives weights to different time series segment features after BiLSTM extracts sequence information. Experimental results show that the model achieves the WA (weighted accuracy) of 87.9%, 76.5%, and 75.2% respectively while the UA (unweighted accuracy) stands at 87.6%, 73.5%, 70.1 % respectively on EMODB, IEMOCAP, and EESDB speech datasets, which is better than most state-of-the-art methods.

Keywords:
Computer science Convolutional neural network Spectrogram Artificial intelligence Feature extraction Speech recognition Deep learning Recurrent neural network Complement (music) Focus (optics) Field (mathematics) Key (lock) Pattern recognition (psychology) Artificial neural network

Metrics

2
Cited By
0.49
FWCI (Field Weighted Citation Impact)
35
Refs
0.66
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
EEG and Brain-Computer Interfaces
Life Sciences →  Neuroscience →  Cognitive Neuroscience

Related Documents

JOURNAL ARTICLE

Convolutional-Recurrent Neural Networks With Multiple Attention Mechanisms for Speech Emotion Recognition

Pengxu JIANGXinzhou XuHuawei TaoLi ZhaoCairong Zou

Journal:   IEEE Transactions on Cognitive and Developmental Systems Year: 2021 Vol: 14 (4)Pages: 1564-1573
JOURNAL ARTICLE

Speech Emotion Recognition using Convolutional Neural Networks and Recurrent Neural Networks with Attention Model

Journal:   Proceedings of 2019 the 9th International Workshop on Computer Science and Engineering Year: 2019
JOURNAL ARTICLE

Convolutional Recurrent Neural Networks Based Speech Emotion Recognition

P. GayathriP. Gowri PriyaLolla SravaniSandra JohnsonVisanth Sampath

Journal:   Journal of Computational and Theoretical Nanoscience Year: 2020 Vol: 17 (8)Pages: 3786-3789
© 2026 ScienceGate Book Chapters — All rights reserved.