This paper presents a comparative study on two classifiers created for speech emotion recognition. Perceiving a person's feeling has consistently been an intriguing task for everyone. These feelings can be expressed through facial expressions, speech, actions, and so forth. The most widely used form of communication is through speech. Speech is an elaborated form of communication constituting various details. These details provide several information such as the abstract of the message, tone of the speaker, language used, background noise, any form of musical sound, emotions, etc. The significance of speech emotion recognition technology is getting mainstream with the advancement of "Voice User Interface" technology. This technology makes it possible for computers to interact with humans by applying speech analysis to understand the instructions given by a person and perform the required tasks and commands. There is always an emotion attached to a piece of speech while communicating but recognizing this emotion is a complex job in the research field. This is mainly because the way emotions are perceived from an audio differs from person to person. I have created two models for speech emotion recognition. I have used Mel Frequency Cepstral Coefficient (MFCC) for feature extraction from the audio files. The first model has been created using Multi-Layer Perceptron (MLP) classifier which gave an accuracy 57.29 percent. The second model was created Long Short-Term Memory (LSTM) and gave a good accuracy of 92.88. I have made use of RAVDESS dataset for classification purpose.
Abhishek GanganiLi ZhangMing Jiang
Watchara SothiritWaranya PoonnawatNuttaporn Hencharoenlert
Efthymios TzinisAlexandras Potamianos
Wootaek LımDaeyoung JangTae‐Jin Lee