Speech Emotion Recognition Using Deep Learning

Mohamed A. Gismelbari; Ilya I. Vixnin; Gregory M. Kovalev; Eugane E. Gogolev

doi:10.1109/scm62608.2024.10554077

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Emotion Recognition Using Deep Learning

Mohamed A. Gismelbari Ilya I. Vixnin Gregory M. Kovalev Eugane E. Gogolev

Year: 2024 Pages: 380-384

DOI: 10.1109/scm62608.2024.10554077

Get Full-Text PDF Get Analytical Report

Abstract

This study explores the application of deep learning techniques in recognizing emotional states from spoken language. Specifically, we employ Convolutional Neural Networks (CNNs) and the HuBERT model to analyze the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Our findings suggest that deep learning models, particularly the HuBERT model, exhibit significant potential in accurately identifying speech emotions. The models were trained and tested on a dataset containing various emotional expressions, including happiness, sadness, anger, and fear, among others. The experimentation involved preprocessing the audio data, feature extraction using Mel Frequency Cepstral Coefficients (MFCCs), and implementing deep learning architectures for emotion classification. The HuBERT model, with its advanced self-supervised learning mechanism, outperformed traditional CNNs in terms of accuracy and efficiency. This research highlights the importance of selecting appropriate deep learning models and feature sets for the task of speech emotion recognition. Our analysis demonstrates that the HuBERT model, by leveraging contextual information and temporal dynamics in speech, offers a promising approach for developing more sensitive and accurate SER systems. These systems have potential applications in various fields, including mental health assessment, interactive voice response systems, and educational software, by enabling machines to understand and respond to human emotions more effectively. The findings of this study contribute to the ongoing discussion in the field of artificial intelligence about the best practices for implementing deep learning techniques in speech processing tasks.

Keywords:

Computer science Speech recognition Emotion recognition Deep learning Artificial intelligence Natural language processing

Metrics

Cited By

2.19

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Emotion Recognition Using Deep Learning

Abstract

Metrics

Citation History

Topics

Related Documents