JOURNAL ARTICLE

End-to-End Speech Emotion Recognition Based on One-Dimensional Convolutional Neural Network

Abstract

Real-time speech emotion recognition has always been a problem. To this end, we proposed an end-to-end speech emotion recognition model based on one-dimensional convolutional neural network, which contains only three convolution layers, two pooling layers and one full-connected layer. Through Adam optimization algorithm and back propagation mechanism, more discriminative features can be extracted continuously. Our model is quite simple in structure and easy to quickly complete the emotional classification task. Compared with traditional methods, there is no need to carry out the complex process of manually extracting features, and the model can automatically learn the emotional features from raw speech signals. In the emotional recognition experiments with EMODB, CASIA, IEMOCAP, and CHEAVD four speech databases, relatively high recognition rates were obtained. Experiments show that the proposed algorithm is of great benefit to the implementation of real-time speech emotion recognition.

Keywords:
Computer science Speech recognition Discriminative model Pooling Convolutional neural network Task (project management) Artificial intelligence Convolution (computer science) Process (computing) Acoustic model Feature extraction Pattern recognition (psychology) End-to-end principle Time delay neural network Artificial neural network Carry (investment) Speech processing

Metrics

20
Cited By
2.75
FWCI (Field Weighted Citation Impact)
20
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.