JOURNAL ARTICLE

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Abstract

The mainstream paradigm of speech emotion recognition (SER) is identifying\nthe single emotion label of the entire utterance. This line of works neglect\nthe emotion dynamics at fine temporal granularity and mostly fail to leverage\nlinguistic information of speech signal explicitly. In this paper, we propose\nEmotion Neural Transducer for fine-grained speech emotion recognition with\nautomatic speech recognition (ASR) joint training. We first extend typical\nneural transducer with emotion joint network to construct emotion lattice for\nfine-grained SER. Then we propose lattice max pooling on the alignment lattice\nto facilitate distinguishing emotional and non-emotional frames. To adapt\nfine-grained SER to transducer inference manner, we further make blank, the\nspecial symbol of ASR, serve as underlying emotion indicator as well, yielding\nFactorized Emotion Neural Transducer. For typical utterance-level SER, our ENT\nmodels outperform state-of-the-art methods on IEMOCAP in low word error rate.\nExperiments on IEMOCAP and the latest speech emotion diarization dataset ZED\nalso demonstrate the superiority of fine-grained emotion modeling. Our code is\navailable at https://github.com/ECNU-Cross-Innovation-Lab/ENT.\n

Keywords:
Computer science Speech recognition Utterance Emotion classification Emotion recognition Artificial neural network Leverage (statistics) Artificial intelligence Natural language processing

Metrics

21
Cited By
23.03
FWCI (Field Weighted Citation Impact)
30
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.