A. PoongodaiY. V. NandiniT MounikaA J KarishmaN. K. Senthil Kumar
Abstract - Speech Emotion Recognition (SER) is a crucial component in enhancing human- computer interaction by enabling machines to recognize and respond to human emotions effectively. This study proposes a novel SER framework using Convolutional Neural Networks (CNNs) augmented with attention mechanisms. The CNNs are employed to capture hierarchical and spatial features from spectrogram representations of speech signals, while Attention mechanisms focus on emotionally salient regions, improving interpretability and accuracy. The proposed model is evaluated on benchmark datasets, demonstrating superior performance compared to traditional methods. This innovative combination of CNNs and attention mechanisms highlights its potential for advancing realworld SER applications such as virtual assistants, customer support systems, and mental health monitoring. By prioritizing critical emotional features, the model improves its practical utility and reliability. This work underlines the importance of deep learning techniques in developing SER technologies, paving the way for more intuitive and effective human-computer interactions. This approach highlights the potential of combining CNNs with attention for advancing SER applications in real-world scenarios.
Konstantinos C. MountzourisIsidoros PerikosIoannis Hatzilygeroudis
Pengxu JIANGXinzhou XuHuawei TaoLi ZhaoCairong Zou
Yawei MuLuis A. Hernández GómezAntonio MontesCARLOS ALCARAZ MARTÍNEZXuetian WangHongmin Gao