JOURNAL ARTICLE

Multimodal Emotion Recognition using Deep Learning Architectures

Abstract

Emotions are an essential part of immaculate communication. The purpose of this research work is to classify six basic emotions of humans namely anger, disgust, fear, happiness, sadness and surprise. In proposed method a sequential deep convolutional neural network is proposed for audio and visual modality. Audio classification is performed via fine-tuning of a pre-trained AlexNet model whereas, visual classification is performed with a hybrid deep network containing CNN and LSTM. Decision level and score level fusion have been implemented for multimodalities. SVM, random forest, K-NN, and logistic regression classifiers were being used for classifying emotion for fused audio-visual data. Experiments have been performed on RML and BAUM-1s dataset with LOSO and LOSGO cross validation techniques respectively. Recognition rates were extremely positive which shows the validity of the proposed methodology.

Keywords:
Disgust Sadness Computer science Artificial intelligence Convolutional neural network Deep learning Surprise Support vector machine Random forest Emotion classification Speech recognition Pattern recognition (psychology) Anger Machine learning Psychology

Metrics

26
Cited By
3.31
FWCI (Field Weighted Citation Impact)
20
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.