JOURNAL ARTICLE

Vocal Sentiments: Transformer Based Speech Emotion Recognition

Didar AliMuhammad ShahabYasir Saleem AfridiRehmat Ullah

Year: 2025 Journal:   VFAST Transactions on Software Engineering Vol: 13 (3)Pages: 187-197

Abstract

Speech Emotion Recognition (SER) plays a crucial role in Human–Computer Interaction (HCI) by enabling systems to interpret and respond to human emotions through speech analysis. This paper presents a Transformer-based SER framework that leverages the Wav2Vec2 model for self-supervised representation learning. Unlike conventional approaches relying on handcrafted acoustic features or shallow learning, our approach employs transfer learning to extract high-level contextual embeddings from raw audio. We integrate two benchmark datasets, RAVDESS and TESS, to improve generalization across diverse speakers and emotions, and further analyze system robustness by introducing varying levels of environmental noise. The proposed model achieves an accuracy of 79.01%, with balanced precision, recall, and F1-scores, demonstrating competitive performance compared with recent state-of-the-art SER models. The main contributions of this work are threefold: (i) a novel evaluation of Wav2Vec2 embeddings on combined RAVDESS–TESS data, (ii) a systematic assessment of noise robustness in Transformer-based SER, and (iii) a comprehensive benchmark that highlights the strengths and limitations of transfer learning in practical emotion recognition scenarios. These findings suggest broad applicability in voice assistants, call-center analytics, and mental health monitoring, while future extensions may incorporate multimodal data and advanced fine-tuning strategies to further enhance performance.

Keywords:

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
14
Refs
0.39
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.