JOURNAL ARTICLE

Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network

Ala Saleh AlluhaidanOumaima SaidaniRashid JahangirMuhammad Asif NaumanOmnia Saidani Neffati

Year: 2023 Journal:   Applied Sciences Vol: 13 (8)Pages: 4750-4750   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Speech emotion recognition (SER) is the process of predicting human emotions from audio signals using artificial intelligence (AI) techniques. SER technologies have a wide range of applications in areas such as psychology, medicine, education, and entertainment. Extracting relevant features from audio signals is a crucial task in the SER process to correctly identify emotions. Several studies on SER have employed short-time features such as Mel frequency cepstral coefficients (MFCCs), due to their efficiency in capturing the periodic nature of audio signals. However, these features are limited in their ability to correctly identify emotion representations. To solve this issue, this research combined MFCCs and time-domain features (MFCCT) to enhance the performance of SER systems. The proposed hybrid features were given to a convolutional neural network (CNN) to build the SER model. The hybrid MFCCT features together with CNN outperformed both MFCCs and time-domain (t-domain) features on the Emo-DB, SAVEE, and RAVDESS datasets by achieving an accuracy of 97%, 93%, and 92% respectively. Additionally, CNN achieved better performance compared to the machine learning (ML) classifiers that were recently used in SER. The proposed features have the potential to be widely utilized to several types of SER datasets for identifying emotions.

Keywords:
Computer science Mel-frequency cepstrum Convolutional neural network Speech recognition Artificial intelligence Emotion recognition Pattern recognition (psychology) Artificial neural network Process (computing) Domain (mathematical analysis) Time domain Feature extraction Computer vision

Metrics

93
Cited By
38.75
FWCI (Field Weighted Citation Impact)
42
Refs
1.00
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Speech emotion recognition using 2D-convolutional neural network

Fauzivy ReggiswarashariSari Widya Sihwi

Journal:   International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering Year: 2022 Vol: 12 (6)Pages: 6594-6594
© 2026 ScienceGate Book Chapters — All rights reserved.