Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks

Rutherford Agbeshi Patamia; Jin Wu; Kingsley Nketia Acheampong; Kwabena Sarpong; Edwin Kwadwo Tenagyei

doi:10.1109/prml52754.2021.9520692

ScienceGate Book Chapters

JOURNAL ARTICLE

Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks

Rutherford Agbeshi Patamia Jin Wu Kingsley Nketia Acheampong Kwabena Sarpong Edwin Kwadwo Tenagyei

Year: 2021

DOI: 10.1109/prml52754.2021.9520692

Get Full-Text PDF Get Analytical Report

Abstract

With the procession of technology, the human-machine interaction research field is in growing need of robust automatic emotion recognition systems. Building machines that interact with humans by comprehending emotions paves the way for developing systems equipped with human-like intelligence. Previous architecture in this field often considers RNN models. However, these models are unable to learn in-depth contextual features intuitively. This paper proposes a transformer-based model that utilizes speech data instituted by previous works, alongside text and mocap data, to optimize our emotional recognition system's performance. Our experimental result shows that the proposed model outperforms the previous state-of-the-art. The IEMOCAP dataset supported the entire experiment.

Keywords:

Computer science Transformer Emotion recognition Speech recognition Recurrent neural network Artificial intelligence Field (mathematics) Architecture Artificial neural network Deep learning Human–computer interaction Machine learning Engineering

Metrics

Cited By

1.83

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks

Abstract

Metrics

Citation History

Topics

Related Documents

Speech Recognition Based On Transformer Neural Networks

Multimodal transformer augmented fusion for speech emotion recognition

Multimodal Emotion Recognition Based on Graph Neural Networks

Key-Sparse Transformer for Multimodal Speech Emotion Recognition

Multimodal Emotion Recognition in Conversations Using Transformer and Graph Neural Networks