Peisong LiuManqiang CheJiangchuan Luo
Emotion recognition is a very important part of HCI, and its application is very extensive. Therefore, emotion recognition has become a research hotspot in recent years. This paper proposes a lightweight emotion recognition network based on mode generation. Specifically, this paper uses audio mode as the input raw data, and converts audio data into text data through the mode generation algorithm to form a bimodal emotion recognition model based on audio and text; In addition, the input audio data is converted into MFCC (Mel Frequency Cepstral Coefficients) to increase the feature quantity of audio modes. Finally, a lightweight network is used to extract its features; Finally, attention mechanism is introduced to fuse the features of the two modes at feature level. The experimental results show that the lightweight emotion recognition network based on mode generation proposed in this paper can greatly reduce the parameters of the model and enable the model to be deployed and implemented on the mobile terminal, while ensuring the accuracy of emotion recognition.
Yichen FengXinfeng YeSathiamoorthy Manoharan
Eun Hee KimLim MyungJinJuhyun Shin