This paper performs speech emotion recognition on short voice messages lasting less than three seconds, using one-dimensional convolutional neural networks. The Ravee dataset, voiced by professional actors, is exploited. The proposed convolutional neural network architecture for the speech emotion recognition system aims to improve accuracy and reduce the total processing cost of the speech emotion recognition model. Moreover, Mel-frequency cepstral coefficients are used as the main features for recognition purposes. Additionally, overfitting problems are avoided by utilizing data augmentation techniques and feature extraction algorithms, which enhance testing ac-curacy by increasing the number of training samples. Various simulations are conducted, through which it is observed that the proposed model provides recognition accuracy of up to 83%.
Anunya SharmaKiran MalikPoonam Bansal
Bayan MahfoodAshraf ElnagarFiruz Kamalov
Abhishek GanganiLi ZhangMing Jiang