JOURNAL ARTICLE

Automatic recognition of speech emotion using long-term spectro-temporal features

Abstract

This paper proposes a novel feature type for the recognition of emotion from speech. The features are derived from a long-term spectro-temporal representation of speech. They are compared to short-term spectral features as well as popular prosodic features. Experimental results with the Berlin emotional speech database show that the proposed features outperform both types of compared features. An average recognition accuracy of 88.6% is achieved by using a combined proposed & prosodic feature set for classifying 7 discrete emotions. Moreover, the proposed features are evaluated on the VAM corpus to recognize continuous emotion primitives. Estimation performance comparable to human evaluations is furnished.

Keywords:
Speech recognition Computer science Emotion recognition Feature (linguistics) Term (time) Set (abstract data type) Representation (politics) Artificial intelligence Pattern recognition (psychology) Feature extraction Natural language processing

Metrics

47
Cited By
5.09
FWCI (Field Weighted Citation Impact)
26
Refs
0.94
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Infant Health and Development
Health Sciences →  Health Professions →  Pharmacy
© 2026 ScienceGate Book Chapters — All rights reserved.