Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Yinggang Xie; Nannan Zhou; Shijuan Zhu

doi:10.63367/199115992025043602019

ScienceGate Book Chapters

JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Yinggang Xie Nannan Zhou Shijuan Zhu

Year: 2025 Journal: 電腦學刊 Vol: 36 (2)Pages: 281-296

DOI: 10.63367/199115992025043602019

Get Full-Text PDF Get Analytical Report

Abstract

Multimodal emotion recognition presents two major challenges: the limited capacity to model higher-order interactions among modalities, and the difficulty of achieving effective fusion due to imbalanced data quality across modalities. To address these issues, this paper proposes a novel model based on hierarchical feature fusion. The model adopts a three-level fusion framework. First, it integrates static fusion with a dynamic weighting mechanism informed by Bayesian uncertainty estimation to achieve initial alignment and importance modeling of modality-specific features. Second, a multi-head cross-modal attention mechanism is introduced to capture contextual dependencies and complementary information across modalities. Finally, gated recurrent units are employed to model temporal dynamics, thereby enhancing the semantic-level fusion representation. Experimental results demonstrate that the proposed method achieves 84.6% accuracy on binary classification tasks using the MOSEI dataset and a weighted F1 score of 69.7% on the IEMOCAP dataset—representing a 2.1% improvement over the representative baseline, COGMEN. Ablation studies further validate the essential contributions of the multi-head attention mechanism, dynamic weighting strategy, and gated fusion module to the overall performance gains.

Keywords:

Feature (linguistics) Fusion Emotion recognition Pattern recognition (psychology) Computer science Artificial intelligence Multimodal therapy Psychology Psychotherapist Linguistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.16

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Advanced Computing and Algorithms

Social Sciences → Social Sciences → Urban Studies

Advanced Algorithms and Applications

Physical Sciences → Engineering → Control and Systems Engineering

Remote Sensing and Land Use

Physical Sciences → Earth and Planetary Sciences → Atmospheric Science

Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Abstract

Metrics

Topics

Related Documents

Multimodal Emotion Recognition Based on Feature Fusion

Multimodal speech emotion recognition via modality constraint with hierarchical bottleneck feature fusion

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Hierarchical Attention‐Based Multimodal Fusion Network for Video Emotion Recognition

Multimodal emotion recognition based on feature fusion and residual connection