JOURNAL ARTICLE

Hierarchical Attention‐Based Multimodal Fusion Network for Video Emotion Recognition

Xiaodong LiuSongyang LiMiao Wang

Year: 2021 Journal:   Computational Intelligence and Neuroscience Vol: 2021 (1)Pages: 5585041-5585041   Publisher: Hindawi Publishing Corporation

Abstract

The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention‐based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.

Keywords:
Computer science Artificial intelligence Feature (linguistics) Context (archaeology) Emotion recognition Feature extraction Pattern recognition (psychology) Modal Representation (politics) Facial expression

Metrics

7
Cited By
1.22
FWCI (Field Weighted Citation Impact)
38
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

Video multimodal emotion recognition based on Bi-GRU and attention fusion

Ruohong HuanJia ShuSheng-Lin BaoRonghua LiangPeng ChenKaikai Chi

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 80 (6)Pages: 8213-8240
JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Hierarchical Feature Fusion

Yinggang XieNannan ZhouShijuan Zhu

Journal:   電腦學刊 Year: 2025 Vol: 36 (2)Pages: 281-296
JOURNAL ARTICLE

Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network

Jiahao ZhaoGanghui RuYi YuYulun WuDichucheng LiWei Li

Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME) Year: 2022 Pages: 1-6
© 2026 ScienceGate Book Chapters — All rights reserved.