Enhancing Bangla Speech Emotion Recognition Through Machine Learning Architectures

Manoara Begum; Md Akash Rahman; Tanjim Mahmud; Mohammad Shahadat Hossain; Karl Andersson

doi:10.1109/access.2025.3629626

ScienceGate Book Chapters

JOURNAL ARTICLE

Enhancing Bangla Speech Emotion Recognition Through Machine Learning Architectures

Manoara Begum Md Akash Rahman Tanjim Mahmud Mohammad Shahadat Hossain Karl Andersson

Year: 2025 Journal: IEEE Access Vol: 13 Pages: 192589-192608 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2025.3629626

Get Full-Text PDF Get Analytical Report

Abstract

Speech Emotion Recognition (SER) is a complex endeavor in human-computer interaction (HCI) that necessitates the use of artificial intelligence and deep learning to accurately classify emotional states, which are determined by analyzing speech audio signals. Bangla is classified as a low-resource language for SER due to the scarcity of labeled datasets, despite its status as the seventh most frequently spoken language at the global level. By utilizing the SUBESCO and BanglaSER corpora, which are both audio-only Bangla emotive speech datasets, this study aims to enhance emotion recognition in Bengali speech. Noise was eliminated through the application of Envelope Masking during preprocessing, and Mel-Frequency Cepstral Coefficients (MFCCs) were extracted to capture critical spectral features. Machine learning models such as K-Nearest Neighbor (KNN), Random Forest, and Multi-Layer Perceptron (MLP) are implemented by the system, in addition to ensemble techniques like Voting and Stacking Classifiers, to optimize its performance. Further, in order to process temporal and sequential speech patterns efficiently, deep learning architectures such as Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) were implemented. The models that were proposed exhibited a high level of perceptual efficiency, obtaining an accuracy of 95.92% on SUBESCO and 90.61% on BanglaSER. These findings substantiate the efficacy of the preprocessing techniques and applied learning models, thereby enhancing BanglaSER and broadening the scope of research opportunities for low-resource languages.

Keywords:

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Enhancing Bangla Speech Emotion Recognition Through Machine Learning Architectures

Abstract

Metrics

Topics

Related Documents

Enhancing Emergency Response Through Speech Emotion Recognition: A Machine Learning Approach

BSER: A Learning Framework for Bangla Speech Emotion Recognition

Bangla Speech Emotion Detection using Machine Learning Ensemble Methods

Emotion Recognition Using Text and Speech Through Machine Learning

Enhancing Speech Emotion Recognition through Bone-Conducted Speech