Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition

Haoqin Sun; Shiwan Zhao; Xuechen Wang; Wenjia Zeng; Yong Chen; Yong Qin

doi:10.1109/icassp48485.2024.10447667

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition

Haoqin Sun Shiwan Zhao Xuechen Wang Wenjia Zeng Yong Chen Yong Qin

Year: 2024 Pages: 11051-11055

DOI: 10.1109/icassp48485.2024.10447667

Get Full-Text PDF Get Analytical Report

Abstract

Multimodal emotion recognition (MMER) is an active research field that aims to accurately recognize human emotions by fusing multiple perceptual modalities. However, inherent heterogeneity across modalities introduces distribution gaps and information redundancy, posing significant challenges for MMER. In this paper, we propose a novel fine-grained disentangled representation learning (FDRL) framework to address these challenges. Specifically, we design modality-shared and modality-private encoders to project each modality into modality-shared and modality-private subspaces, respectively. In the shared subspace, we introduce a fine-grained alignment component to learn modality-shared representations, thus capturing modal consistency. Subsequently, we tailor a fine-grained disparity component to constrain the private subspaces, thereby learning modality-private representations and enhancing their diversity. Lastly, we introduce a fine-grained predictor component to ensure that the labels of the output representations from the encoders remain unchanged. Experimental results on the IEMOCAP dataset show that FDRL outperforms the state-of-the-art methods, achieving 78.34% and 79.44% on WAR and UAR, respectively.

Keywords:

Modality (human–computer interaction) Computer science Modalities Linear subspace Feature learning Encoder Subspace topology Redundancy (engineering) Artificial intelligence Component (thermodynamics) Representation (politics) Multimodal learning Machine learning Natural language processing Mathematics

Metrics

Cited By

21.93

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Human Pose and Action Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Fine-Grained Disentangled Representation Learning For Multimodal Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Disentangled Representation Learning for Multimodal Emotion Recognition

Learning Disentangled Representation for Fine-Grained Visual Categorization

Fine-Grained Emotion Comprehension: Semisupervised Multimodal Emotion and Intensity Recognition

Disentangled Representation and Contrastive Learning with Adaptive Affinity Squeeze-Excitation for Multimodal Emotion Recognition

Disentangled Feature Network for Fine-Grained Recognition