In order to improve the anthropomorphic nature of intelligent speech products, the academic research on speech emotion recognition is getting hotter and hotter. Currently, the speech emotion recognition system mainly consists of two steps: speech feature extraction and speech feature classification. In order to improve the accuracy of speech emotion recognition, the Mel Frequency Cepstrum Coefficient (MFCC) of speech signal, which has a good effect on the feature capability in the field of speech at this stage, is chosen as the input of the deep learning network, and the ResNet-BiGRU network based on the attention mechanism is used to extract the MFCC information is extracted using ResNet-BiGRU network based on the attention mechanism. The experimental results show that the introduction of attention mechanism in the model can effectively focus on useful information and reduce the interference of redundant information. The accuracy rate on the Chinese sentiment corpus CASIA reached 84.83%.
K. Sindhu PriyaK. YashwanthA BhuvaneshwarP Veeresh
Hai SuPeng LiuSongsen YuShan Yang
Liyan ZhangYetong WangJiaxin DuXinyu Wang
Changjiang JiangJunliang LiuRong MaoSifan Sun