JOURNAL ARTICLE

Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition

Abstract

Deep Neural Network Hidden Markov Models, or DNN-HMMs, are recently very promising acoustic models achieving good speech recognition results over Gaussian mixture model based HMMs (GMM-HMMs). In this paper, for emotion recognition from speech, we investigate DNN-HMMs with restricted Boltzmann Machine (RBM) based unsupervised pre-training, and DNN-HMMs with discriminative pre-training. Emotion recognition experiments are carried out on these two models on the eNTERFACE'05 database and Berlin database, respectively, and results are compared with those from the GMM-HMMs, the shallow-NN-HMMs with two layers, as well as the Multi-layer Perceptrons HMMs (MLP-HMMs). Experimental results show that when the numbers of the hidden layers as well hidden units are properly set, the DNN could extend the labeling ability of GMM-HMM. Among all the models, the DNN-HMMs with discriminative pre-training obtain the best results. For example, for the eNTERFACE'05 database, the recognition accuracy improves 12.22% from the DNN-HMMs with unsupervised pre-training, 11.67% from the GMM-HMMs, 10.56% from the MLP-HMMs, and even 17.22% from the shallow-NN-HMMs, respectively.

Keywords:
Hidden Markov model Discriminative model Computer science Speech recognition Artificial intelligence Pattern recognition (psychology) Mixture model Artificial neural network Restricted Boltzmann machine Multilayer perceptron Perceptron

Metrics

157
Cited By
3.84
FWCI (Field Weighted Citation Impact)
16
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Hidden Markov model-based speech emotion recognition

Björn W. SchullerGerhard RigollM. Lang

Journal:   2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Year: 2003 Vol: 2 Pages: II-1
JOURNAL ARTICLE

HYBRID NEURAL NETWORK/HIDDEN MARKOV MODEL SYSTEMS FOR CONTINUOUS SPEECH RECOGNITION

Nelson MorganHervé BourlardSteve RenalsMichael CohenHoracio Franco

Journal:   International Journal of Pattern Recognition and Artificial Intelligence Year: 1993 Vol: 07 (04)Pages: 899-916
© 2026 ScienceGate Book Chapters — All rights reserved.