JOURNAL ARTICLE

MGAT: Multi-Granularity Attention Based Transformers for Multi-Modal Emotion Recognition

Abstract

Multi-modal emotion recognition is crucial for human-computer interaction. Many existing algorithms attempt to achieve multi-modal interactions through a cross-attention mechanism. Due to the problems of noise introduction and heavy computation in the original attention mechanism, window attention has become a new trend. However, emotions are presented asynchronously between different modalities, which makes it difficult to interact with emotional information between windows. Furthermore, multi-modal data are temporally misaligned, so single fixed window size is hard to describe cross-modal information. In this paper, we put these two issues into a unified framework and propose the multi-granularity attention based Transformers (MGAT). It addresses the emotional asynchrony and modality misalignment issues through a multi-granularity attention mechanism. Experimental results confirm the effectiveness of our method and the state-of-the-art performance is achieved on IEMOCAP.

Keywords:
Granularity Computer science Modal Modalities Modality (human–computer interaction) Computation Transformer Artificial intelligence Algorithm Engineering Voltage

Metrics

9
Cited By
3.75
FWCI (Field Weighted Citation Impact)
35
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
EEG and Brain-Computer Interfaces
Life Sciences →  Neuroscience →  Cognitive Neuroscience
© 2026 ScienceGate Book Chapters — All rights reserved.