In this study, a speech emotion recognition technique based on a deep learning neural network that uses the King Saud University Emotions' Arabic dataset is presented. The convolutional neural network and long short-term memory (LSTM) are used to design the primary system of the convolutional recurrent neural network (CRNN). This study further investigates the use of linearly spaced spectrograms as inputs to the emotional speech recognizers. The performance of the CRNN system is compared with the results obtained through an experiment evaluating the human capability to perceive the emotion from speech. This human perceptual evaluation is considered as the baseline system. The overall CRNN system achieves 84.55% and 77.51% accuracies for file and segment levels, respectively. These values of accuracy are considerably close to the human emotion perception scores.
Abhishek GanganiLi ZhangMing Jiang
Wootaek LımDaeyoung JangTae‐Jin Lee
P. GayathriP. Gowri PriyaLolla SravaniSandra JohnsonVisanth Sampath
Anunya SharmaKiran MalikPoonam Bansal