JOURNAL ARTICLE

Speech emotion recognition based on ResNet-BiGRU network

Abstract

In order to improve the anthropomorphic nature of intelligent speech products, the academic research on speech emotion recognition is getting hotter and hotter. Currently, the speech emotion recognition system mainly consists of two steps: speech feature extraction and speech feature classification. In order to improve the accuracy of speech emotion recognition, the Mel Frequency Cepstrum Coefficient (MFCC) of speech signal, which has a good effect on the feature capability in the field of speech at this stage, is chosen as the input of the deep learning network, and the ResNet-BiGRU network based on the attention mechanism is used to extract the MFCC information is extracted using ResNet-BiGRU network based on the attention mechanism. The experimental results show that the introduction of attention mechanism in the model can effectively focus on useful information and reduce the interference of redundant information. The accuracy rate on the Chinese sentiment corpus CASIA reached 84.83%.

Keywords:
Computer science Mel-frequency cepstrum Speech recognition Feature extraction Feature (linguistics) Artificial intelligence Cepstrum Focus (optics) Field (mathematics) Emotion recognition Pattern recognition (psychology) Speaker recognition Mathematics

Metrics

3
Cited By
0.77
FWCI (Field Weighted Citation Impact)
0
Refs
0.73
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Educational Technology and Pedagogy
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.