Zhongwen TuRaoxin YanShizhuang WengJiatong LiWei Zhao
Emotion recognition remains a challenging task in human–computer interaction. With advancements in multimodal computing, multimodal emotion recognition has become increasingly important and significant. To address the existing limitations in multimodal fusion efficiency, emotional–semantic association mining, and long-range context modeling, we propose an innovative graph neural network (GNN)-based framework. Our methodology integrates three key components: (1) a hierarchical sequential fusion (HSF) multimodal integration approach, (2) a sentiment–emotion enhanced joint learning framework, and (3) a context-similarity dual-layer graph architecture (CS-BiGraph). The experimental results demonstrate that our method achieves 69.1% accuracy on the IEMOCAP dataset, establishing a new state-of-the-art performance. For future work, we will explore robust extensions of our framework under real-world scenarios with higher noise levels and investigate the integration of emerging modalities for broader applicability.
Tijana ĐurkićNikola SimićSiniša SuzićDragana BajovićZoran PerićVlado Delić
Jinbao XieYu‐Long WangTianxin MengTai JinYuhua ZhengYury I. Varatnitski
Junjie ZhangGuangmin SunKun ZhengSarah MazharXiaohui FuDong Yang
Ming LiJiandong ShiLu BaiChangqin HuangYunliang JiangKe LüShijin WangEdwin R. Hancock
Hua JinYang TianLulu YanChangda WangXuehua Song