JOURNAL ARTICLE

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks

Abstract

Emotion recognition has been an active research area with both wide applications and big challenges. This paper presents our effort for the Audio/Visual Emotion Challenge (AVEC2015), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). Our system applies the Recurrent Neural Networks (RNN) to model temporal information. We explore various aspects to improve the prediction performance including: the dominant modalities for arousal and valence prediction, duration of features, novel loss functions, directions of Long Short Term Memory (LSTM), multi-task learning, different structures for early feature fusion and late fusion. Best settings are chosen according to the performance on the development set. Competitive experimental results compared with the baseline show the effectiveness of the proposed methods.

Keywords:
Computer science Arousal Modalities Long short term memory Emotion recognition Recurrent neural network Valence (chemistry) Speech recognition Artificial intelligence Artificial neural network Task (project management) Feature (linguistics) Modal Machine learning Pattern recognition (psychology) Psychology

Metrics

107
Cited By
10.78
FWCI (Field Weighted Citation Impact)
30
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.