Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network

Ala Saleh Alluhaidan; Oumaima Saidani; Rashid Jahangir; Muhammad Asif Nauman; Omnia Saidani Neffati

doi:10.3390/app13084750

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network

Ala Saleh Alluhaidan Oumaima Saidani Rashid Jahangir Muhammad Asif Nauman Omnia Saidani Neffati

Year: 2023 Journal: Applied Sciences Vol: 13 (8)Pages: 4750-4750 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app13084750

Get Full-Text PDF Get Analytical Report

Abstract

Speech emotion recognition (SER) is the process of predicting human emotions from audio signals using artificial intelligence (AI) techniques. SER technologies have a wide range of applications in areas such as psychology, medicine, education, and entertainment. Extracting relevant features from audio signals is a crucial task in the SER process to correctly identify emotions. Several studies on SER have employed short-time features such as Mel frequency cepstral coefficients (MFCCs), due to their efficiency in capturing the periodic nature of audio signals. However, these features are limited in their ability to correctly identify emotion representations. To solve this issue, this research combined MFCCs and time-domain features (MFCCT) to enhance the performance of SER systems. The proposed hybrid features were given to a convolutional neural network (CNN) to build the SER model. The hybrid MFCCT features together with CNN outperformed both MFCCs and time-domain (t-domain) features on the Emo-DB, SAVEE, and RAVDESS datasets by achieving an accuracy of 97%, 93%, and 92% respectively. Additionally, CNN achieved better performance compared to the machine learning (ML) classifiers that were recently used in SER. The proposed features have the potential to be widely utilized to several types of SER datasets for identifying emotions.

Keywords:

Computer science Mel-frequency cepstrum Convolutional neural network Speech recognition Artificial intelligence Emotion recognition Pattern recognition (psychology) Artificial neural network Process (computing) Domain (mathematical analysis) Time domain Feature extraction Computer vision

Metrics

Cited By

38.75

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Emotion Recognition through Hybrid Features and Convolutional Neural Network

Abstract

Metrics

Citation History

Topics

Related Documents

Improvement of Speech Emotion Recognition by Deep Convolutional Neural Network and Speech Features

Convolutional Neural Network Techniques for Speech Emotion Recognition

Speech emotion recognition based on convolutional neural network

Speech emotion recognition using 2D-convolutional neural network

Convolutional Neural Network (CNN) Based Speech-Emotion Recognition