Speech Emotion Recognition with a ResNet-CNN-Transformer Parallel Neural Network

Siqi Han; Feng Leng; Zitong Jin

doi:10.1109/cisce52179.2021.9445906

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Emotion Recognition with a ResNet-CNN-Transformer Parallel Neural Network

Siqi Han Feng Leng Zitong Jin

Year: 2021 Pages: 803-807

DOI: 10.1109/cisce52179.2021.9445906

Get Full-Text PDF Get Analytical Report

Abstract

As a challenging pattern recognition task, speech emotion recognition has attracted more and more attention in recent years and is widely used in medical, Affective Computing, and other fields. In this paper, we proposed a parallel network of ResNet-CNN-Transformer Encoder. The Res-Net is used to alleviate the problems caused by the deepening of the network. The CNN calculates the fewer parameters to increase the fitting expression ability of the network. Due to the traditional recurrent neural network, with a long-term dependence on the feature extraction of speech and text sequences and sequence attributes not capturing long-distance features, the multi attention mechanism of the transformer coding layer is used to parallelize the sequence, improve the processing speed and extract the emotional semantic information in the sequence. Experiments are carried out on the RAVDESS dataset. Our results demonstrate the effectiveness of the proposed method and make a significant improvement compared with the previous results.

Keywords:

Computer science Transformer Encoder Feature extraction Speech recognition Artificial neural network Recurrent neural network Artificial intelligence Pattern recognition (psychology) Time delay neural network Long short term memory Coding (social sciences) Voltage

Metrics

Cited By

5.89

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Face and Expression Recognition

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Speech Emotion Recognition with a ResNet-CNN-Transformer Parallel Neural Network

Abstract

Metrics

Citation History

Topics

Related Documents

2D CNN – Transformer Parallel Network for Speech Emotion Recognition

2D CNN – Transformer Parallel Network for Speech Emotion Recognition

Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition

Convolutional Neural Network (CNN) Based Speech-Emotion Recognition

Speech Emotion Recognition using CNN-TRANSFORMER Architecture