Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition

Longfei Li; Yong Zhao; Dongmei Jiang; Shuicheng Yan; Fengna Wang; Isabel González; Valentin Enescu; Hichem Sahli

doi:10.1109/acii.2013.58

ScienceGate Book Chapters

JOURNAL ARTICLE

Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition

Longfei Li Yong Zhao Dongmei Jiang Shuicheng Yan Fengna Wang Isabel González Valentin Enescu Hichem Sahli

Year: 2013 Pages: 312-317

DOI: 10.1109/acii.2013.58

Get Full-Text PDF Get Analytical Report

Abstract

Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

Keywords:

Hidden Markov model Discriminative model Computer science Speech recognition Artificial intelligence Pattern recognition (psychology) Mixture model Artificial neural network Restricted Boltzmann machine Multilayer perceptron Perceptron

Metrics

157

Cited By

3.84

FWCI (Field Weighted Citation Impact)

Refs

0.94

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Pig Audio Recognition Based on Deep Neural Network (Dnn) and Hidden Markov Models (Hmm)

Hybrid neural network/hidden Markov model continuous-speech recognition

Hidden Markov model-based speech emotion recognition

Hidden Markov model-based speech emotion recognition

HYBRID NEURAL NETWORK/HIDDEN MARKOV MODEL SYSTEMS FOR CONTINUOUS SPEECH RECOGNITION