Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Siyuan Shen; Yu Gao; Feng Liu; Hanyang Wang; Aimin Zhou

doi:10.1109/icassp48485.2024.10446974

ScienceGate Book Chapters

JOURNAL ARTICLE

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Siyuan Shen Yu Gao Feng Liu Hanyang Wang Aimin Zhou

Year: 2024 Pages: 10111-10115

DOI: 10.1109/icassp48485.2024.10446974

Get Full-Text PDF Get Analytical Report

Abstract

The mainstream paradigm of speech emotion recognition (SER) is identifying\nthe single emotion label of the entire utterance. This line of works neglect\nthe emotion dynamics at fine temporal granularity and mostly fail to leverage\nlinguistic information of speech signal explicitly. In this paper, we propose\nEmotion Neural Transducer for fine-grained speech emotion recognition with\nautomatic speech recognition (ASR) joint training. We first extend typical\nneural transducer with emotion joint network to construct emotion lattice for\nfine-grained SER. Then we propose lattice max pooling on the alignment lattice\nto facilitate distinguishing emotional and non-emotional frames. To adapt\nfine-grained SER to transducer inference manner, we further make blank, the\nspecial symbol of ASR, serve as underlying emotion indicator as well, yielding\nFactorized Emotion Neural Transducer. For typical utterance-level SER, our ENT\nmodels outperform state-of-the-art methods on IEMOCAP in low word error rate.\nExperiments on IEMOCAP and the latest speech emotion diarization dataset ZED\nalso demonstrate the superiority of fine-grained emotion modeling. Our code is\navailable at https://github.com/ECNU-Cross-Innovation-Lab/ENT.\n

Keywords:

Computer science Speech recognition Utterance Emotion classification Emotion recognition Artificial neural network Leverage (statistics) Artificial intelligence Natural language processing

Metrics

Cited By

23.03

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Sentiment Analysis and Opinion Mining

Physical Sciences → Computer Science → Artificial Intelligence

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

DEER: Deep Emotion-Sets for Fine-Grained Emotion Recognition

Speech Emotion Recognition Using Sequences of Fine-grained Emotion Labels with Phoneme Class Attributes

Fine-Grained Emotion Comprehension: Semisupervised Multimodal Emotion and Intensity Recognition

Towards LLM-Empowered Fine-Grained Speech Descriptors for Explainable Emotion Recognition