Emotions are strong messengers that convey our inner experiences, wants, and aspirations. Furthermore, correctly understanding the emotions allows us to negotiate life's problems, make educated decisions, build meaningful connections with others, and develop emotional intelligence. This research aims at automatically determining the emotion of a person accurately using multimodal emotion recognition strategy which is a fusion of acoustic and visual modalities. The RAVDESS dataset has been used for the purpose of emotion detection. Machine learning algorithms such SVM, Random Forest, KNN, Gradient Boosting, MLP, Decision Tree, Nave Bayes, and Ensemble Learning techniques were used for testing and training to identify emotion from the auditory components. The Le-Net 5 model was used to identify emotion from visual imagery. Metrics like accuracy, confusion matrix and training testing validation loss were used to evaluate the performance of these models. The proposed technique uses high-quality audio and video data, with the acoustic ensemble method attaining 65% accuracy and the video CNN model obtaining an accuracy of 86%. The recognition accuracy increases to 94.5% when the acoustic and visual components are combined at model level.
Pengfei LiuJie LuoGengchen MaZhuoma Cairang
Jinming ZhaoShizhe ChenShuai WangQin Jin
Ch. Venkata Krishna ReddyT. R. Vijaya LakshmiBagesh Kumar
Cheng‐Yao ChenYue-Kai HuangPerry Cook
Leon KorenTomislav StipanĉićAndrija RičkoLuka Orsag