JOURNAL ARTICLE

Domain-Separated Bottleneck Attention Fusion Framework for Multimodal Emotion Recognition

Peng HeJun YuChengjie GeWei JiaW. L. XuLei WangTianyu LiuZhen Kan

Year: 2025 Journal:   ACM Transactions on Multimedia Computing Communications and Applications Vol: 21 (4)Pages: 1-21   Publisher: Association for Computing Machinery

Abstract

As a focal point of research in various fields, human body language understanding has long been a subject of intense interest. Within this realm, the exploration of emotion recognition through the analysis of facial expressions, voice patterns, and physiological signals holds significant practical value. Compared with unimodal approaches, multimodal emotion recognition models leverage complementary information from vision, acoustic, and language modalities to robust perceive the human sentiment attitudes. However, the heterogeneity among modality signals leads to significant domain shifts, posing challenges for achieving balanced fusion. In this article, we propose a Domain-Separated Bottleneck Attention (DBA) Fusion Framework for human multimodal emotion recognition with lower computational complexity. Specifically, we partition each modality into two distinct domains: the invariant/private domain. The invariant domain contains crucial shared information, while the private domain aims to capture modality-specific representations. For the decomposed features, we introduce two sets of bottleneck cross-attention modules to effectively utilize the complementarity between domains to reduce redundant information. In each module, we interweave two Fusion Adapter blocks into the Self-Attention Transformer backbone. Each Fusion Adapter block integrates a small group of latent tokens as bridges for inter-modal and inter-domain interactions, mitigating the adverse effects of modality distribution differences and lowering computational costs. Extensive experimental results demonstrate that our method outperforms State-of-the-Art (SOTA) approaches across three widely used benchmark datasets.

Keywords:
Computer science Bottleneck Domain (mathematical analysis) Human–computer interaction Emotion recognition Fusion Artificial intelligence Embedded system

Metrics

4
Cited By
25.10
FWCI (Field Weighted Citation Impact)
71
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Human Pose and Action Recognition
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Video Surveillance and Tracking Methods
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.