Abstract

Over the last decade, the study of emotion recognition has attracted a lot of attention in the area of human-computer interaction. Current recognition accuracy can be improved, but more research into the fundamental temporal link between speech waveforms is needed. A method for speech recognition is proposed that takes advantage of difference in emotional saturation between time frames (RNN) using combination of speech features with attention-based Long Short-term Memory (LSTM) recurrent neural networks. In place of standard statistical features, frame level speech features were derived from the waveform to retain the original speech's temporal relations through sequence of frames. Two LSTM enhancement algorithms based on the attention mechanism have been presented to distinguish emotional saturation in distinct frames. An Emotion Recognition in Conversion system that is capable of recognizing face emotion in real-time was one of the systems that was proposed.

Keywords:
Computer science Speech recognition Artificial intelligence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.22
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.