The speech emotion recognition system can significantly improve the efficiency of human-computer interaction by accurately recognizing emotional information in speech. This system typically includes two main steps: speech feature extraction and emotion classification. In order to improve accuracy, this article uses MFCC features, short-term energy features, and short-term average zero crossing rate as model inputs, and introduces a convolutional neural network based on attention mechanism and a bidirectional gated loop unit (BiGRU). This method can effectively focus on useful information in speech features. Compared to the CNN-BiLSTM network based on attention mechanism and the CNN-GRU network based on attention mechanism, this method can effectively improve the accuracy of speech emotion recognition when conducting experiments on the Chinese sentiment corpus CASIA.
Changjiang JiangJunliang LiuRong MaoSifan Sun
Jung Hyun LeeUi Nyoung YoonGeun‐Sik Jo
Xiaoyu TangJiazheng HuangYixin LinTing DangJintao Cheng