Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Taiyang Guo; Zhi Zhu; Shunsuke Kidani; Masashi Unoki

doi:10.3390/app12199979

ScienceGate Book Chapters

JOURNAL ARTICLE

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Taiyang Guo Zhi Zhu Shunsuke Kidani Masashi Unoki

Year: 2022 Journal: Applied Sciences Vol: 12 (19)Pages: 9979-9979 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app12199979

Get Full-Text PDF Get Analytical Report

Abstract

In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and no reverberation). Other studies also clarified that vocal emotion recognition using NVS is not affected by noisy reverberant environments (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s). However, the contribution of MSFs to vocal emotion recognition in noisy reverberant environments is still unclear. We aimed to clarify whether MSFs can be used to explain the vocal-emotion-recognition results in noisy reverberant environments. We analyzed the results of vocal-emotion-recognition experiments and used an auditory-based modulation filterbank to calculate the modulation spectrograms of NVS. We then extracted ten MSFs as higher-order statistics of modulation spectrograms. As shown from the relationship between MSFs and vocal-emotion-recognition results, except for extremely high noisy reverberant environments, there were high similarities between MSFs and the vocal emotion recognition results in noisy reverberant environments, which indicates that MSFs can be used to explain such results in noisy reverberant environments. We also found that there are two common MSFs (MSKTk (modulation spectral kurtosis) and MSTLk (modulation spectral tilt)) that contribute to vocal emotion recognition in all daily environments.

Keywords:

Speech recognition Spectrogram Modulation (music) Psychology Noise (video) Reverberation Background noise Emotion recognition Acoustics Computer science Artificial intelligence

Metrics

Cited By

0.78

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Vehicle Noise and Vibration Control

Physical Sciences → Engineering → Automotive Engineering

Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments

Abstract

Metrics

Citation History

Topics

Related Documents

Contribution of modulation spectral features on the perception of vocal-emotion using noise-vocoded speech

Contribution of modulation spectral features for cross-lingual speech emotion recognition under noisy reverberant conditions

Speech emotion recognition in noisy and reverberant environments

Study on the relationship between modulation spectral features and the perception of vocal emotion with noise-vocoded speech

Amplitude modulation spectrogram based features for robust speech recognition in noisy and reverberant environments