A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

Fan Qi; Huaiwen Zhang; Xiaoshan Yang; Changsheng Xu

doi:10.1109/tcsvt.2024.3362270

ScienceGate Book Chapters

JOURNAL ARTICLE

A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

Fan Qi Huaiwen Zhang Xiaoshan Yang Changsheng Xu

Year: 2024 Journal: IEEE Transactions on Circuits and Systems for Video Technology Vol: 34 (7)Pages: 5728-5741 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/tcsvt.2024.3362270

Get Full-Text PDF Get Analytical Report

Abstract

Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.

Keywords:

Computer science Emotion recognition Zero (linguistics) Artificial intelligence Pattern recognition (psychology) Human–computer interaction Speech recognition

Metrics

Cited By

12.06

FWCI (Field Weighted Citation Impact)

104

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Multimodal zero-shot learning for tactile texture recognition

Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network

Novel multimodal contrast learning framework using zero-shot prediction for abnormal behavior recognition

Traffic Sign Recognition Framework Using Zero-Shot Learning

Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition