Abstract

This study explores the application of deep learning techniques in recognizing emotional states from spoken language. Specifically, we employ Convolutional Neural Networks (CNNs) and the HuBERT model to analyze the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Our findings suggest that deep learning models, particularly the HuBERT model, exhibit significant potential in accurately identifying speech emotions. The models were trained and tested on a dataset containing various emotional expressions, including happiness, sadness, anger, and fear, among others. The experimentation involved preprocessing the audio data, feature extraction using Mel Frequency Cepstral Coefficients (MFCCs), and implementing deep learning architectures for emotion classification. The HuBERT model, with its advanced self-supervised learning mechanism, outperformed traditional CNNs in terms of accuracy and efficiency. This research highlights the importance of selecting appropriate deep learning models and feature sets for the task of speech emotion recognition. Our analysis demonstrates that the HuBERT model, by leveraging contextual information and temporal dynamics in speech, offers a promising approach for developing more sensitive and accurate SER systems. These systems have potential applications in various fields, including mental health assessment, interactive voice response systems, and educational software, by enabling machines to understand and respond to human emotions more effectively. The findings of this study contribute to the ongoing discussion in the field of artificial intelligence about the best practices for implementing deep learning techniques in speech processing tasks.

Keywords:
Computer science Speech recognition Emotion recognition Deep learning Artificial intelligence Natural language processing

Metrics

2
Cited By
2.19
FWCI (Field Weighted Citation Impact)
10
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.