JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Abstract

Multimodal emotion recognition presents two major challenges: the limited capacity to model higher-order interactions among modalities, and the difficulty of achieving effective fusion due to imbalanced data quality across modalities. To address these issues, this paper proposes a novel model based on hierarchical feature fusion. The model adopts a three-level fusion framework. First, it integrates static fusion with a dynamic weighting mechanism informed by Bayesian uncertainty estimation to achieve initial alignment and importance modeling of modality-specific features. Second, a multi-head cross-modal attention mechanism is introduced to capture contextual dependencies and complementary information across modalities. Finally, gated recurrent units are employed to model temporal dynamics, thereby enhancing the semantic-level fusion representation. Experimental results demonstrate that the proposed method achieves 84.6% accuracy on binary classification tasks using the MOSEI dataset and a weighted F1 score of 69.7% on the IEMOCAP dataset—representing a 2.1% improvement over the representative baseline, COGMEN. Ablation studies further validate the essential contributions of the multi-head attention mechanism, dynamic weighting strategy, and gated fusion module to the overall performance gains.

Keywords:
Feature (linguistics) Fusion Emotion recognition Pattern recognition (psychology) Computer science Artificial intelligence Multimodal therapy Psychology Psychotherapist Linguistics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
32
Refs
0.16
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Advanced Computing and Algorithms
Social Sciences →  Social Sciences →  Urban Studies
Advanced Algorithms and Applications
Physical Sciences →  Engineering →  Control and Systems Engineering
Remote Sensing and Land Use
Physical Sciences →  Earth and Planetary Sciences →  Atmospheric Science

Related Documents

JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Feature Fusion

Yurui XuXiao WuHang SuXiaorui Liu

Journal:   2022 International Conference on Advanced Robotics and Mechatronics (ICARM) Year: 2022 Pages: 7-11
JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Xia LiuZhijing XuKan Huang

Journal:   Computational Intelligence and Neuroscience Year: 2023 Vol: 2023 (1)Pages: 9645611-9645611
JOURNAL ARTICLE

Hierarchical Attention‐Based Multimodal Fusion Network for Video Emotion Recognition

Xiaodong LiuSongyang LiMiao Wang

Journal:   Computational Intelligence and Neuroscience Year: 2021 Vol: 2021 (1)Pages: 5585041-5585041
© 2026 ScienceGate Book Chapters — All rights reserved.