JOURNAL ARTICLE

Multi-modal Fusion for Continuous Emotion Recognition by Using Auto-Encoders

Abstract

Human stress detection is of great importance for monitoring mental health. The Multimodal Sentiment Analysis Challenge (MuSe) 2021 focuses on emotion, physiological-emotion, and stress recognition as well as sentiment classification by exploiting several modalities. In this paper, we present our solution for the Muse-Stress sub-challenge. The target of this sub-challenge is continuous prediction of arousal and valence for people under stressful conditions where text transcripts, audio and video recordings are provided. To this end, we utilize bidirectional Long Short-Term Memory (LSTM) and Gated Recurrent Unit networks (GRU) to explore high-level and low-level features from different modalities. We employ Concordance Correlation Coefficient (CCC) as a loss function and evaluation metric for our model. To improve the unimodal predictions, we add difficulty indicators of the data obtained by using Auto-Encoders. Finally, we perform late fusion on our unimodal predictions in addition to the difficulty indicators to obtain our final predictions. With this approach, we achieve CCC of 0.4278 and 0.5951 for arousal and valence respectively on the test set, our submission to MuSe 2021 ranks in the top three for arousal, fourth for valence, and in top three for combined results.

Keywords:
Arousal Valence (chemistry) Computer science Speech recognition Encoder Modalities Artificial intelligence Metric (unit) Emotion recognition Pattern recognition (psychology) Machine learning Psychology

Metrics

10
Cited By
2.03
FWCI (Field Weighted Citation Impact)
37
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Mental Health via Writing
Social Sciences →  Psychology →  Social Psychology

Related Documents

JOURNAL ARTICLE

Emotion Recognition Using Multi-Scale Auto-Encoders with Cross Session Adoption

G ChennaKesava Reddyet al.

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Emotion Recognition Using Multi-Scale Auto-Encoders with Cross Session Adoption

G ChennaKesava Reddyet al.

Journal:   Zenodo (CERN European Organization for Nuclear Research) Year: 2025
JOURNAL ARTICLE

Branch-Fusion-Net for Multi-Modal Continuous Dimensional Emotion Recognition

Chiqin LiLun XieHang Pan

Journal:   IEEE Signal Processing Letters Year: 2022 Vol: 29 Pages: 942-946
JOURNAL ARTICLE

Estimating Multi-Modal Dense Multipath Components using Auto-Encoders

Steffen SchielerMichael DöbereinerSebastian SemperMartin Landmann

Journal:   2022 30th European Signal Processing Conference (EUSIPCO) Year: 2022 Pages: 1716-1720
© 2026 ScienceGate Book Chapters — All rights reserved.