JOURNAL ARTICLE

Topic and Style-aware Transformer for Multimodal Emotion Recognition

Abstract

Understanding emotion expressions in multimodal signals is key for machines to have a better understanding of human communication. While language, visual and acoustic modalities can provide clues from different perspectives, the visual modality is shown to make minimal contribution to the performance in the emotion recognition field due to its high dimensionality. Therefore, we first leverage the strong multimodality backbone VATT to project the visual signal to the common space with language and acoustic signals. Also, we propose content-oriented features Topic and Speaking style on top of it to approach the subjectivity issues. Experiments conducted on the benchmark dataset MOSEI show our model can outperform SOTA results and effectively incorporate visual signals and handle subjectivity issues by serving as content "normalization".

Keywords:
Computer science Modalities Normalization (sociology) Leverage (statistics) Subjectivity Transformer Speech recognition Multimodality Modality (human–computer interaction) Multimodal learning Benchmark (surveying) Emotion recognition Artificial intelligence Natural language processing Engineering

Metrics

5
Cited By
1.28
FWCI (Field Weighted Citation Impact)
28
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Multimodal Machine Learning Applications
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

BOOK-CHAPTER

Contextually Aware Multimodal Emotion Recognition

Preet ShahPatnala Prudhvi RajP. SureshBhaskarjyoti Das

Advances in intelligent systems and computing Year: 2020 Pages: 745-753
BOOK-CHAPTER

Context-Aware Multimodal Emotion Recognition

Aaishwarya KhalaneTalal Shaikh

Lecture notes in networks and systems Year: 2022 Pages: 51-61
JOURNAL ARTICLE

Topic-aware video summarization using multimodal transformer

Yubo ZhuWentian ZhaoRui HuaXinxiao Wu

Journal:   Pattern Recognition Year: 2023 Vol: 140 Pages: 109578-109578
JOURNAL ARTICLE

Multimodal Neurophysiological Transformer for Emotion Recognition

Sharath KoorathotaZain Ahmad KhanPawan LapborisuthPaul Sajda

Journal:   2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) Year: 2022 Vol: 2022 Pages: 3563-3567
© 2026 ScienceGate Book Chapters — All rights reserved.