Speech emotion recognition in noisy and reverberant environments

Panikos Heracleous; Keiji Yasuda; Fumiaki Sugaya; Akio Yoneyama; Masayuki Hashimoto

doi:10.1109/acii.2017.8273610

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech emotion recognition in noisy and reverberant environments

Panikos Heracleous Keiji Yasuda Fumiaki Sugaya Akio Yoneyama Masayuki Hashimoto

Year: 2017 Pages: 262-266

DOI: 10.1109/acii.2017.8273610

Get Full-Text PDF Get Analytical Report

Abstract

The current study is focused on automatic speech emotion recognition, and particularly on the effect of additive noise and reverberation on speech emotion recognition. The emotional clean speech is produced by four professional actors, who simulate the neutral, joy, anger, and sadness emotions. To produce noisy emotional speech data, white Gaussian noise is superimposed onto the clean speech at several signal-to-noise ratio (SNR) levels. Concerning the reverberant emotional speech data, a technique is applied which is based on convolution of clean speech data with impulse responses recorded in several environments with different reverberation times. The four emotions are recognized using i-vectors, along with probabilistic linear discriminant analysis (PLDA), widely used in speaker recognition and adapted here for speech emotion recognition. When noisy and reverberant emotional speech data are recognized using clean models, the recognition rates are significantly decreased compared to the clean test data. To address this problem, a method based on multi-style training is applied, which utilizes training data of several SNR levels (different to the SNR level of the test data, therefore SNR-independent), or training on data with different reverberation times. Using multi-style training to recognize emotions in noisy or reverberant environments, the recognition rates are significantly increased, and the differences compared to the clean case are statistically not significant. Furthermore, the i-vector paradigm based classification method is compared with a baseline Gaussian mixture models (GMM) based method, and it demonstrates superior performance.

Keywords:

Speech recognition Reverberation Computer science Sadness Test data Anger Artificial intelligence Pattern recognition (psychology) Psychology Acoustics

Metrics

Cited By

1.01

FWCI (Field Weighted Citation Impact)

Refs

0.77

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech emotion recognition in noisy and reverberant environments

Abstract

Metrics

Citation History

Topics

Related Documents

Robust Speech Recognition in Noisy and Reverberant Environments

Speech Emotion Recognition Based on EMD in Noisy Environments

Statistical Acoustic Model Adaptation for Robust Speech Recognition in Noisy Reverberant Environments

Enhancement of reverberant speech in noisy acoustical environments

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments