Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Sanghyun Lee; David K. Han; Hanseok Ko

doi:10.3390/s20226688

ScienceGate Book Chapters

JOURNAL ARTICLE

Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Sanghyun Lee David K. Han Hanseok Ko

Year: 2020 Journal: Sensors Vol: 20 (22)Pages: 6688-6688 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/s20226688

Get Full-Text PDF Get Analytical Report

Abstract

Speech emotion recognition predicts the emotional state of a speaker based on the person’s speech. It brings an additional element for creating more natural human–computer interactions. Earlier studies on emotional recognition have been primarily based on handcrafted features and manual labels. With the advent of deep learning, there have been some efforts in applying the deep-network-based approach to the problem of emotion recognition. As deep learning automatically extracts salient features correlated to speaker emotion, it brings certain advantages over the handcrafted-feature-based methods. There are, however, some challenges in applying them to the emotion recognition problem, because data required for properly training deep networks are often lacking. Therefore, there is a need for a new deep-learning-based approach which can exploit available information from given speech signals to the maximum extent possible. Our proposed method, called “Fusion-ConvBERT”, is a parallel fusion model consisting of bidirectional encoder representations from transformers and convolutional neural networks. Extensive experiments were conducted on the proposed model using the EMO-DB and Interactive Emotional Dyadic Motion Capture Database emotion corpus, and it was shown that the proposed method outperformed state-of-the-art techniques in most of the test configurations.

Keywords:

Computer science Deep learning Artificial intelligence Speech recognition Convolutional neural network Emotion recognition Salient Feature (linguistics) Encoder Artificial neural network Pattern recognition (psychology)

Metrics

Cited By

4.01

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Fusion-ConvBERT: Parallel Convolution and BERT Fusion for Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Speech Emotion Recognition via Parallel Dual-Branch Fusion Model

Emotion Speech Recognition Using Fusion Technique

Classifier fusion for speech emotion recognition

Speech emotion recognition via multiple fusion under spatial–temporal parallel network

Speech Emotion Recognition Based on Fusion Method