Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

Marc Dominic Enriquez; Crisron Rudolf Lucas; Angelina Aquino

doi:10.1109/issc59246.2023.10162085

ScienceGate Book Chapters

JOURNAL ARTICLE

Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

Marc Dominic Enriquez Crisron Rudolf Lucas Angelina Aquino

Year: 2023 Pages: 1-6

DOI: 10.1109/issc59246.2023.10162085

Get Full-Text PDF Get Analytical Report

Abstract

Speech Emotion Recognition (SER) focuses on understanding the human emotion in a given speech utterance using its acoustic and/or linguistic features. This paper presents a comparison between two speech representation inputs for SER: spectrograms and scalograms. Speech signals from four databases (Emo-DB, RAVDESS, SAVEE, and a mix of all three) were converted into each type of representation and were used to train variations of a convolutional neural network (CNN) VGG16 Model-3. Results show that the scalogram-based models have a higher mean f1-score compared to the spectrogram-based models; however, further analysis indicate that the difference is statistically insignificant at a 95% confidence level. In conclusion, spectrograms and scalograms have statistically the same performance on the systems presented.

Keywords:

Spectrogram Speech recognition Computer science Convolutional neural network Utterance Representation (politics) Artificial intelligence Pattern recognition (psychology)

Metrics

Cited By

0.54

FWCI (Field Weighted Citation Impact)

Refs

0.57

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Scalogram vs Spectrogram as Speech Representation Inputs for Speech Emotion Recognition Using CNN

Abstract

Metrics

Citation History

Topics

Related Documents

Speech Emotion Recognition Using Scalogram Based Deep Structure

Detecting human emotion via speech recognition by using speech spectrogram

Speech Emotion Recognition Using Spectrogram & Phoneme Embedding

Speech Emotion Recognition Using Spectrogram Patterns as Features

Speech Emotion Recognition Using MELBP Variants of Spectrogram Image