JOURNAL ARTICLE

Multilevel Transformer for Multimodal Emotion Recognition

Abstract

Multimodal emotion recognition has attracted much attention recently. Fusing multiple modalities effectively with limited labeled data is a challenging task. Considering the success of pre-trained model and fine-grained nature of emotion expression, we think it is reasonable to take these two aspects into consideration. Unlike previous methods that mainly focus on one aspect, we introduce a novel multi-granularity framework, which combines fine-grained representation with pre-trained utterance-level representation. Inspired by Transformer TTS, we propose a multilevel transformer model to perform fine-grained multimodal emotion recognition. Specifically, we explore different methods to incorporate phoneme-level embedding with word-level embedding. To perform multi-granularity learning, we simply combine multilevel transformer model with Bert. Extensive experimental results show that multilevel transformer model outperforms previous state-of-the-art approaches on IEMOCAP dataset. Multi-granularity model achieves additional performance improvement.

Keywords:
Computer science Transformer Embedding Granularity Utterance Artificial intelligence Modalities Natural language processing Speech recognition Machine learning Engineering

Metrics

7
Cited By
2.92
FWCI (Field Weighted Citation Impact)
34
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Multimodal Neurophysiological Transformer for Emotion Recognition

Sharath KoorathotaZain Ahmad KhanPawan LapborisuthPaul Sajda

Journal:   2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) Year: 2022 Vol: 2022 Pages: 3563-3567
JOURNAL ARTICLE

Noise-Resistant Multimodal Transformer for Emotion Recognition

Yuanyuan LiuHaoyu ZhangYibing ZhanZijing ChenGuanghao YinLin WeiZhe Chen

Journal:   International Journal of Computer Vision Year: 2024 Vol: 133 (5)Pages: 3020-3040
JOURNAL ARTICLE

Token-disentangling Mutual Transformer for multimodal emotion recognition

Guanghao YinYuanyuan LiuTengfei LiuHaoyu ZhangFang FangChang TangLiangxiao Jiang

Journal:   Engineering Applications of Artificial Intelligence Year: 2024 Vol: 133 Pages: 108348-108348
JOURNAL ARTICLE

Multimodal transformer augmented fusion for speech emotion recognition

Yuanyuan WangYu GuYifei YinYingping HanHe ZhangShuang WangChenyu LiDou Quan

Journal:   Frontiers in Neurorobotics Year: 2023 Vol: 17 Pages: 1181598-1181598
© 2026 ScienceGate Book Chapters — All rights reserved.