JOURNAL ARTICLE

CochleaTion: Speech Emotion Recognition Through Cochleagram with CNN-GRU and Attention Mechanism

Abstract

Speech Emotion Recognition (SER) system classifies the human emotional state based on speaker's utterances in different categories. This study proposes a novel SER system using cochleagram which is an acoustic feature associated with human auditory perception of emotions. The proposed model integrates a hybrid architecture comprising Convolutional Neural Network (CNN) and Gated Recurrent Unit (GRU) network, augmented with self-attention mechanism. Evaluation of the model is conducted on the BanglaSER and RAVDESS dataset, where BanglaSER provides a notable accuracy of 91.17% in categorizing five distinct emotions: angry, calm, happy, neutral, and sad. Furthermore,on the RAVDESS dataset, the model exhibited a solid accuracy of 78.35% in classifying eight diverse emotions. The incorporation of cochleagram and the hybrid neural network design showcases the efficacy of the proposed SER system, offering a promising approach for precise and efficient emotion categorization in speech signals,

Keywords:
Computer science Mechanism (biology) Speech recognition Emotion recognition Natural language processing Artificial intelligence

Metrics

3
Cited By
2.33
FWCI (Field Weighted Citation Impact)
13
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hand Gesture Recognition Systems
Physical Sciences →  Computer Science →  Human-Computer Interaction
Computer Science and Engineering
Physical Sciences →  Computer Science →  Artificial Intelligence
Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.