JOURNAL ARTICLE

Speech emotion recognition in noisy and reverberant environments

Abstract

The current study is focused on automatic speech emotion recognition, and particularly on the effect of additive noise and reverberation on speech emotion recognition. The emotional clean speech is produced by four professional actors, who simulate the neutral, joy, anger, and sadness emotions. To produce noisy emotional speech data, white Gaussian noise is superimposed onto the clean speech at several signal-to-noise ratio (SNR) levels. Concerning the reverberant emotional speech data, a technique is applied which is based on convolution of clean speech data with impulse responses recorded in several environments with different reverberation times. The four emotions are recognized using i-vectors, along with probabilistic linear discriminant analysis (PLDA), widely used in speaker recognition and adapted here for speech emotion recognition. When noisy and reverberant emotional speech data are recognized using clean models, the recognition rates are significantly decreased compared to the clean test data. To address this problem, a method based on multi-style training is applied, which utilizes training data of several SNR levels (different to the SNR level of the test data, therefore SNR-independent), or training on data with different reverberation times. Using multi-style training to recognize emotions in noisy or reverberant environments, the recognition rates are significantly increased, and the differences compared to the clean case are statistically not significant. Furthermore, the i-vector paradigm based classification method is compared with a baseline Gaussian mixture models (GMM) based method, and it demonstrates superior performance.

Keywords:
Speech recognition Reverberation Computer science Sadness Test data Anger Artificial intelligence Pattern recognition (psychology) Psychology Acoustics

Metrics

14
Cited By
1.01
FWCI (Field Weighted Citation Impact)
26
Refs
0.77
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.