JOURNAL ARTICLE

Multimodal Music Emotion Recognition with Hierarchical Cross-Modal Attention Network

Jiahao ZhaoGanghui RuYi YuYulun WuDichucheng LiWei Li

Year: 2022 Journal:   2022 IEEE International Conference on Multimedia and Expo (ICME) Pages: 1-6

Abstract

Computational music emotion recognition is to recognize the emotional content in music tracks. In computational music emotion recognition studies, researchers have paid close attention to the audio content of the music tracks. Although lyrics content and music context contribute greatly to the perceived emotion, these kinds of emotional information are usually ignored. Based on this finding, we propose a multimodal music emotion recognition method jointly predicting the valence and arousal values by combining the audio, lyrics, track name, and artist of a given track. Audio features, lyrics features and context features are extracted separately and fused by a cross-modal attention mechanism, forming a hierarchical structure. Our proposed model outperforms two baselines by a large margin and achieves state-of-the-art performance on two public datasets.

Keywords:
Lyrics Computer science Speech recognition Audio analyzer Modal Emotion recognition Context (archaeology) Valence (chemistry) Artificial intelligence Audio signal processing Audio signal Speech coding

Metrics

21
Cited By
2.95
FWCI (Field Weighted Citation Impact)
25
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology

Related Documents

JOURNAL ARTICLE

A Hierarchical Cross-Modal Spatial Fusion Network for Multimodal Emotion Recognition

Ming XuTuo ShiHao ZhangZeyi LiuXiao He

Journal:   IEEE Transactions on Artificial Intelligence Year: 2025 Vol: 6 (5)Pages: 1429-1438
JOURNAL ARTICLE

Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

Lili GuoYikang SongShifei Ding

Journal:   Knowledge-Based Systems Year: 2024 Vol: 296 Pages: 111969-111969
JOURNAL ARTICLE

Multi modal music emotion recognition based on hierarchical attention network and knowledge distillation

Shuqi Li

Journal:   CCF Transactions on Pervasive Computing and Interaction Year: 2026
JOURNAL ARTICLE

Emotion-aware cross-modal music generation based on multimodal emotion recognition

Xueer SunXiao HanFei YaoJiawei Xu

Journal:   Alexandria Engineering Journal Year: 2025 Vol: 133 Pages: 254-270
© 2026 ScienceGate Book Chapters — All rights reserved.