JOURNAL ARTICLE

Speech Emotion Recognition using Spectral Images and Convolutional Neural Network

Abstract

Employing a computer for automatic speech-emotion identification is a formidable and intricate undertaking. Speech emotion recognition (SER) has gained significant popularity among academics for over three decades due to its wide range of applications in many industries, such as medical treatment, marketing, customer service, driving, internet searching, and education. Researchers used many approaches to enhance the efficiency of emotion categorization. In our work, we used the images of the mel frequency cepstral coefficient (MFCC), mel-spectrogram, and a combination of both as feature input to a 2D convolutional neural network (2D-CNN) classifier to classify the emotion. We trained the model with individuals and a combination of images of the proposed feature to classify the emotion. Based on the experimental results, we observed that the suggested feature combination MFCC and mel-spectrogram performed superior to the individual in terms of speech signal emotion recognition. To assess the efficacy of our features, we used three datasets: TESS, RAVDESS, and EMO-DB. For the EMO-DB, TESS, and RAVDESS datasets, we found that the accuracy of emotion categorization was 88.89%, 100%, and 81.2%, respectively.

Keywords:
Computer science Convolutional neural network Speech recognition Emotion recognition Artificial intelligence Pattern recognition (psychology)

Metrics

1
Cited By
0.42
FWCI (Field Weighted Citation Impact)
29
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Infant Health and Development
Health Sciences →  Health Professions →  Pharmacy

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.