JOURNAL ARTICLE

Speech Emotion Recognition Using Deep Learning Hybrid Models

Abstract

Speech Emotion Recognition (SER) has been essential to Human-Computer Interaction (HCI) and other complex speech processing systems over the past decade. Due to the emotive differences between different speakers, SER is a complex and challenging process. The features retrieved from speech signals are crucial to SER systems’ performance. It is still challenging to develop efficient feature extracting and classification models. This study suggested hybrid deep learning models for accurately extracting crucial features and enhancing predictions with higher probabilities. Initially, the Mel spectrogram’s temporal features are trained using a combination of stacked Convolutional Neural Networks (CNN) & Long-term short memory (LSTM). The said model performs well. For enhancing the speech, samples are initially preprocessed using data improvement and dataset balancing techniques. The RAVDNESS dataset is used in this study which contains 1440 samples of audio in North American English accent. The strength of the CNN algorithm is used for obtaining spatial features and sequence encoding conversion, which generates accuracy above 93.9% for the model on mentioned data set when classifying emotions into one of eight categories. The model is generalized using Additive white Gaussian noise (AWGN) and Dropout techniques.

Keywords:
Computer science Spectrogram Speech recognition Artificial intelligence Convolutional neural network Feature (linguistics) Deep learning Emotive Pattern recognition (psychology) Dropout (neural networks) Noise (video) Hidden Markov model Feature extraction Speech processing Machine learning

Metrics

6
Cited By
1.48
FWCI (Field Weighted Citation Impact)
24
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

R RoopaS HarishDevi Priya V SBendale Dhanashri Dilip

Journal:   International Research Journal of Innovations in Engineering and Technology Year: 2025 Vol: 09 (Special Issue ICCIS)Pages: 194-199
JOURNAL ARTICLE

Hybrid deep learning models based emotion recognition with speech signals

M. Kalpana ChowdaryE. Anu PriyaDaniela DănciulescuJ. AnithaD. Jude Hemanth

Journal:   Intelligent Decision Technologies Year: 2023 Vol: 17 (4)Pages: 1435-1453
JOURNAL ARTICLE

Emotion recognition using deep learning models in Chinese speech

Ko-Chun HungAmmar AmjadYi‐Ping ChaoHsien-Tsung Chang

Journal:   Entertainment Computing Year: 2025 Vol: 55 Pages: 101039-101039
JOURNAL ARTICLE

Deep Learning Models for Speech Emotion Recognition

V.M. PraseethaSangil Vadivel

Journal:   Journal of Computer Science Year: 2018 Vol: 14 (11)Pages: 1577-1587
© 2026 ScienceGate Book Chapters — All rights reserved.