The problem of recognizing and classifying emotions in speech is one of the most relevant and significant research topics, however, hardly any studies have been conducted to date for a large number of languages to achieve the required accuracy. Expressing and recognizing emotions based on the signal of the human speech is one of the complex issues that is distinct from languages. This paper proposes a systematical and robust approach to implement an emotion recognition system for low resource languages such as Persian. To the best of our knowledge, this is the first SER work on the Persian language using deep learning techniques. Sharif Emotional Speech Database ShEMO with five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state is identified as suitable candidate to evaluate a 1D Convolutional Neural Network (1DCNN) architecture. The data are first processed using Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method and then feed MFCC as input feature to our neural network. Experimental results demonstrate that our proposed method achieves about 74% classification accuracy on ShEMO dataset.
Gurumayum Robert MichaelDr Aditya Bihar Kandali.
M. Nanda KumarThokala Eswar ChandMettu Joseph Rithvik ReddyNarra Pranay Reddy
GENG Lei, FU Hongliang, TAO Huawei, LU Yuan, GUO Xinying, ZHAO Li