Multimodal emotion recognition has a wide range of applications in the fields of intelligent recommendation and human-computer interaction. In recent research on emotion recognition, the model using recurrent neural network could observe the context semantic information and infer the emotion label jointly according to the context information, but it lacked the capture of key information and did not solve the problem of network degradation. Therefore, this paper designs a model based on Bi-LSTM, multi-head attention mechanism and residual connection fusion (Att-BiLSTM). The Bi-LSTM structure realizes the context semantic inference function, the multi-head attention mechanism realizes the emphasis on key information, and the residual connection alleviates the network degradation caused by the overfitting problem. Att-BiLSTM achieves 62.1% precision and 61.8% recall on the IEMOCAP dataset, which is better than the existing comparison algorithms.
Junfeng ZhangLining XingZhen TanHongsen WangKesheng Wang
Yadi WangXiaoding GuoXianhong HouZhijun MiaoXiaojin YangJinkai Guo