Abstract

With the procession of technology, the human-machine interaction research field is in growing need of robust automatic emotion recognition systems. Building machines that interact with humans by comprehending emotions paves the way for developing systems equipped with human-like intelligence. Previous architecture in this field often considers RNN models. However, these models are unable to learn in-depth contextual features intuitively. This paper proposes a transformer-based model that utilizes speech data instituted by previous works, alongside text and mocap data, to optimize our emotional recognition system's performance. Our experimental result shows that the proposed model outperforms the previous state-of-the-art. The IEMOCAP dataset supported the entire experiment.

Keywords:
Computer science Transformer Emotion recognition Speech recognition Recurrent neural network Artificial intelligence Field (mathematics) Architecture Artificial neural network Deep learning Human–computer interaction Machine learning Engineering

Metrics

9
Cited By
1.83
FWCI (Field Weighted Citation Impact)
43
Refs
0.83
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Speech Recognition Based On Transformer Neural Networks

Narzillo MamatovNilufar NiyozmatovaSh. Sh. AbdullaevAbdurashid SamijonovKeulimjay Erejepov

Journal:   2021 International Conference on Information Science and Communications Technologies (ICISCT) Year: 2021 Pages: 1-5
JOURNAL ARTICLE

Multimodal transformer augmented fusion for speech emotion recognition

Yuanyuan WangYu GuYifei YinYingping HanHe ZhangShuang WangChenyu LiDou Quan

Journal:   Frontiers in Neurorobotics Year: 2023 Vol: 17 Pages: 1181598-1181598
JOURNAL ARTICLE

Multimodal Emotion Recognition Based on Graph Neural Networks

Zhongwen TuRaoxin YanShizhuang WengJiatong LiWei Zhao

Journal:   Applied Sciences Year: 2025 Vol: 15 (17)Pages: 9622-9622
JOURNAL ARTICLE

Key-Sparse Transformer for Multimodal Speech Emotion Recognition

Weidong ChenXiaofeng XingXiangmin XuJichen YangJianxin Pang

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 6897-6901
© 2026 ScienceGate Book Chapters — All rights reserved.