Improving Speech Emotion Recognition Using Data Augmentation and Balancing Techniques

Chawki Barhoumi; Yassine Ben Ayed

doi:10.1109/cw58918.2023.00051

ScienceGate Book Chapters

JOURNAL ARTICLE

Improving Speech Emotion Recognition Using Data Augmentation and Balancing Techniques

Chawki Barhoumi Yassine Ben Ayed

Year: 2023 Pages: 282-289

DOI: 10.1109/cw58918.2023.00051

Get Full-Text PDF Get Analytical Report

Abstract

Speech Emotion Recognition (SER) is a challenging task due to the complexity and variability of human emotions. In this paper, we propose an innovative approach to improve SER performance on the EMODB dataset. Our approach employs data augmentation techniques, such as noise addition and spectrogram shift, as well as balancing techniques, including random oversampling. We also extract five different features from the dataset samples: MFCC, Chroma, Mel Spectrogram, ZCR, and RMS. We compare the performance of four different classifiers - MLP, SVM, KNN, and CNN - with and without the use of our proposed approach. Our results demonstrate that the proposed approach significantly enhances the accuracy of speech emotion recognition compared to the approach without data augmentation and balancing techniques. Our experiments reveal that the proposed approach achieves higher accuracy and F1-score compared to other approaches, with MLP and CNN models achieving 100% accuracy. These findings highlight the effectiveness of data augmentation and balancing techniques in improving the performance of speech emotion recognition. Moreover, our approach holds great potential for application in various real-life scenarios, including mental health monitoring, human-robot interaction, and speech-based virtual assistants.

Keywords:

Computer science Spectrogram Speech recognition Artificial intelligence Random forest Mel-frequency cepstrum Support vector machine Oversampling Noise (video) Task (project management) Pattern recognition (psychology) Feature extraction Machine learning

Metrics

Cited By

2.08

FWCI (Field Weighted Citation Impact)

Refs

0.82

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

EEG and Brain-Computer Interfaces

Life Sciences → Neuroscience → Cognitive Neuroscience

Improving Speech Emotion Recognition Using Data Augmentation and Balancing Techniques

Abstract

Metrics

Citation History

Topics

Related Documents

Improve Speech Emotion Recognition using Data Augmentation and Balancing Technique

Speech emotion recognition using data augmentation

Speech Emotion Recognition Using Data Augmentation

Towards Improving Speech Emotion Recognition Using Synthetic Data Augmentation from Emotion Conversion

Improved speech emotion recognition using histogram equalization and data augmentation techniques