JOURNAL ARTICLE

Utterance and Syllable Level Prosodic Features for Automatic Emotion Recognition

Abstract

This paper describes an automatic emotion recognition (AER) system that combines prosodic features extracted at utterance level and syllable level to recognize its emotional content. The prosodic features are extracted after identifying speech/non-speech intervals, followed by syllable level segmentation. Prosodic features chosen include parameters for representing dynamics of pitch and energy, along with duration information. Two separate classifiers are built using Deep Neural Networks (DNN). The decision scores based on both levels are fused to identify the emotion of a test utterance from the German Emotion Database (Emo-DB) which contains seven emotions, namely anger, boredom, disgust, fear, happiness, sadness and neutral. The proposed system gives a Weighted Average Recall (WAR) of 58.88% for both utterance level and syllable level prosodic features. Fusion of scores by merely adding the scores gives an overall WAR of 61.68%.

Keywords:
Utterance Speech recognition Sadness Syllable Computer science Artificial intelligence Anger Natural language processing Happiness Segmentation Disgust Psychology

Metrics

11
Cited By
2.23
FWCI (Field Weighted Citation Impact)
17
Refs
0.86
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Infant Health and Development
Health Sciences →  Health Professions →  Pharmacy
© 2026 ScienceGate Book Chapters — All rights reserved.