With rapid development of deep learning technology, great progress has been made in many areas. Convolutional Neural Networks(CNNs) has achieved unprecedented success in the field of computer vision. Recurrent Neural Network(RNNs) and the Attention Mechanism work well for time series tasks. Through investigation a speech emotion recognition(SER) model is proposed in this paper, which based on the CNN, the Long short-term memory(LSTM) and the Attention Mechanism without using any traditional hand-crafted features. Meanwhile, to expand the data set, a new flipping method was proposed for data enhancement. By applying the proposed model and the new data enhancement method to the emotional speech database, the classification result was verified to have better accuracy.
Ala Saleh AlluhaidanOumaima SaidaniRashid JahangirMuhammad Asif NaumanOmnia Saidani Neffati
Nupoor C. KhandelwalMinakshi M. WanjariBhushan Vidhale
Fatin B. SofiaS. AhmedAbdul-basit K. Faeq
Deepali SaleAmol P. BhagatPranit BhalekarRohit GordeMahendra Gayakwad