Speech Emotion Recognition Based on Heterogeneous Parallel Neural Network

ZHANG Huiyun, HUANG Heming

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Emotion Recognition Based on Heterogeneous Parallel Neural Network

ZHANG Huiyun, HUANG Heming

Year: 2022 Journal: DOAJ (DOAJ: Directory of Open Access Journals)

Get Full-Text PDF Get Analytical Report

Abstract

The core of a Speech Emotion Recognition(SER) system is to extract features that can best represent speech emotion and construct an acoustic model with strong robustness and generalization.In this study, a heterogeneous parallel Recurrent Neural Network(RNN) model based on the attention mechanism AHPCL is constructed for SER.The Long Short-Term Memory(LSTM) network is used to extract the time-series features of speech emotion, and the convolution operation is used to extract the speech spatial spectral features.By combining temporal and spatial information to jointly represent speech emotion, the accuracy of the prediction results is improved.The attention mechanism is used to assign weights according to the contribution of different time-series features to speech emotion to select a time sequence that better represents speech emotion from a large amount of feature information.Low-level descriptor features such as pitch, Zero Crossing Rate(ZCR), and Mel-Frequency Cepstrum Coefficient(MFCC) are extracted from three speech emotion databases, namely CASIA, EMODB, and SAVEE, and the high-level statistical functions of these low-level descriptor features are calculated to obtain 219 dimensional features.The experimental results show that the proposed model achieves 86.02%, 84.03%, and 64.06% Unweighted Average Recall(UAR) on the CASIA, EMODB, and SAVEE databases, respectively.Compared with the LeNet, DNN-ELM, and TSFFCNN baseline models, the AHPCL model exhibits greater robustness and generalization.

Keywords:

Robustness (evolution) Cepstrum Artificial neural network Pattern recognition (psychology) Emotion recognition Mel-frequency cepstrum Construct (python library) Convolutional neural network Feature extraction

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.50

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Speech Emotion Recognition Based on Heterogeneous Parallel Neural Network

Abstract

Metrics

Topics

Related Documents

Speech emotion recognition based on convolutional neural network

Convolutional Neural Network (CNN) Based Speech-Emotion Recognition

Relative Speech Emotion Recognition Based Artificial Neural Network

Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network

Vision Transformer and Parallel Convolutional Neural Network for Speech Emotion Recognition