Speech Emotion Recognition Using Deep Learning Hybrid Models

Jamsher Bhanbhro; Shahnawaz Talpur; Asif Aziz Memon

doi:10.1109/icetecc56662.2022.10069212

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Emotion Recognition Using Deep Learning Hybrid Models

Jamsher Bhanbhro Shahnawaz Talpur Asif Aziz Memon

Year: 2022 Pages: 1-5

DOI: 10.1109/icetecc56662.2022.10069212

Get Full-Text PDF Get Analytical Report

Abstract

Speech Emotion Recognition (SER) has been essential to Human-Computer Interaction (HCI) and other complex speech processing systems over the past decade. Due to the emotive differences between different speakers, SER is a complex and challenging process. The features retrieved from speech signals are crucial to SER systems’ performance. It is still challenging to develop efficient feature extracting and classification models. This study suggested hybrid deep learning models for accurately extracting crucial features and enhancing predictions with higher probabilities. Initially, the Mel spectrogram’s temporal features are trained using a combination of stacked Convolutional Neural Networks (CNN) & Long-term short memory (LSTM). The said model performs well. For enhancing the speech, samples are initially preprocessed using data improvement and dataset balancing techniques. The RAVDNESS dataset is used in this study which contains 1440 samples of audio in North American English accent. The strength of the CNN algorithm is used for obtaining spatial features and sequence encoding conversion, which generates accuracy above 93.9% for the model on mentioned data set when classifying emotions into one of eight categories. The model is generalized using Additive white Gaussian noise (AWGN) and Dropout techniques.

Keywords:

Computer science Spectrogram Speech recognition Artificial intelligence Convolutional neural network Feature (linguistics) Deep learning Emotive Pattern recognition (psychology) Dropout (neural networks) Noise (video) Hidden Markov model Feature extraction Speech processing Machine learning

Metrics

Cited By

1.48

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Emotion Recognition Using Deep Learning Hybrid Models

Abstract

Metrics

Citation History

Topics

Related Documents

Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

Speech Emotion Recognition Using Hybrid Deep Learning Models and Diverse Acoustic Features

Hybrid deep learning models based emotion recognition with speech signals

Emotion recognition using deep learning models in Chinese speech

Deep Learning Models for Speech Emotion Recognition