JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Cascaded Multichannel and Hierarchical Fusion

Xia LiuZhijing XuKan Huang

Year: 2023 Journal:   Computational Intelligence and Neuroscience Vol: 2023 (1)Pages: 9645611-9645611   Publisher: Hindawi Publishing Corporation

Abstract

Humans express their emotions in a variety of ways, which inspires research on multimodal fusion‐based emotion recognition that utilizes different modalities to achieve information complementation. However, extracting deep emotional features from different modalities and fusing them remain a challenging task. It is essential to exploit the advantages of different extraction and fusion approaches to capture the emotional information contained within and across modalities. In this paper, we present a novel multimodal emotion recognition framework called multimodal emotion recognition based on cascaded multichannel and hierarchical fusion (CMC‐HF), where visual, speech, and text signals are simultaneously utilized as multimodal inputs. First, three cascaded channels based on deep learning technology perform feature extraction for the three modalities separately to enhance deeper information extraction ability within each modality and improve recognition performance. Second, an improved hierarchical fusion module is introduced to promote intermodality interactions of three modalities and further improve recognition and classification accuracy. Finally, to validate the effectiveness of the designed CMC‐HF model, some experiments are conducted to evaluate two benchmark datasets, IEMOCAP and CMU‐MOSI. The results show that we achieved an almost 2%∼3.2% increase in accuracy of the four classes for the IEMOCAP dataset as well as an improvement of 0.9%∼2.5% in the average class accuracy for the CMU‐MOSI dataset when compared to the existing state‐of‐the‐art methods. The ablation experimental results indicate that the cascaded feature extraction method and the hierarchical fusion method make a significant contribution to multimodal emotion recognition, suggesting that the three modalities contain deeper information interactions of both intermodality and intramodality. Hence, the proposed model has better overall performance and achieves higher recognition efficiency and better robustness.

Keywords:
Computer science Modalities Benchmark (surveying) Artificial intelligence Feature extraction Modality (human–computer interaction) Emotion recognition Feature (linguistics) Pattern recognition (psychology) Machine learning Speech recognition

Metrics

34
Cited By
14.17
FWCI (Field Weighted Citation Impact)
48
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Yinggang XieNannan ZhouShijuan Zhu

Journal:   電腦學刊 Year: 2025 Vol: 36 (2)Pages: 281-296
JOURNAL ARTICLE

Hierarchical Attention‐Based Multimodal Fusion Network for Video Emotion Recognition

Xiaodong LiuSongyang LiMiao Wang

Journal:   Computational Intelligence and Neuroscience Year: 2021 Vol: 2021 (1)Pages: 5585041-5585041
JOURNAL ARTICLE

Fusion with Hierarchical Graphs for Multimodal Emotion Recognition

Shuyun TangZhaojie LuoGuoshun NanJun BabaYuichiro YoshikawaHiroshi Ishiguro

Journal:   2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Year: 2022
JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Feature Fusion

Yurui XuXiao WuHang SuXiaorui Liu

Journal:   2022 International Conference on Advanced Robotics and Mechatronics (ICARM) Year: 2022 Pages: 7-11
© 2026 ScienceGate Book Chapters — All rights reserved.