JOURNAL ARTICLE

Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching

Abstract

Automatic emotion recognition is an active research topic with wide range of\napplications. Due to the high manual annotation cost and inevitable label\nambiguity, the development of emotion recognition dataset is limited in both\nscale and quality. Therefore, one of the key challenges is how to build\neffective models with limited data resource. Previous works have explored\ndifferent approaches to tackle this challenge including data enhancement,\ntransfer learning, and semi-supervised learning etc. However, the weakness of\nthese existing approaches includes such as training instability, large\nperformance loss during transfer, or marginal improvement.\n In this work, we propose a novel semi-supervised multi-modal emotion\nrecognition model based on cross-modality distribution matching, which\nleverages abundant unlabeled data to enhance the model training under the\nassumption that the inner emotional status is consistent at the utterance level\nacross modalities.\n We conduct extensive experiments to evaluate the proposed model on two\nbenchmark datasets, IEMOCAP and MELD. The experiment results prove that the\nproposed semi-supervised learning model can effectively utilize unlabeled data\nand combine multi-modalities to boost the emotion recognition performance,\nwhich outperforms other state-of-the-art approaches under the same condition.\nThe proposed model also achieves competitive capacity compared with existing\napproaches which take advantage of additional auxiliary information such as\nspeaker and interaction context.\n

Keywords:
Computer science Artificial intelligence Modalities Machine learning Benchmark (surveying) Matching (statistics) Ambiguity Context (archaeology) Modality (human–computer interaction) Key (lock) Resource (disambiguation)

Metrics

57
Cited By
7.21
FWCI (Field Weighted Citation Impact)
52
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Text and Document Classification Technologies
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition

Zheng LianBin LiuJianhua Tao

Journal:   IEEE Transactions on Affective Computing Year: 2022 Vol: 14 (3)Pages: 2415-2429
JOURNAL ARTICLE

Cyclic Data Distillation Semi-Supervised Learning for Multi-Modal Emotion Recognition

Shuzhen LiTong ZhangC. L. Philip Chen

Journal:   IEEE Transactions on Knowledge and Data Engineering Year: 2025 Vol: 37 (9)Pages: 5078-5092
JOURNAL ARTICLE

Cross-modal dynamic convolution for multi-modal emotion recognition

Huanglu WenShaodi YouYing Fu

Journal:   Journal of Visual Communication and Image Representation Year: 2021 Vol: 78 Pages: 103178-103178
© 2026 ScienceGate Book Chapters — All rights reserved.