Semi-Supervised Learning for Multimodal Speech and Emotion Recognition

Yuanchao Li

doi:10.1145/3462244.3481274

ScienceGate Book Chapters

JOURNAL ARTICLE

Semi-Supervised Learning for Multimodal Speech and Emotion Recognition

Yuanchao Li

Year: 2021 Pages: 817-821

DOI: 10.1145/3462244.3481274

Get Full-Text PDF Get Analytical Report

Abstract

Speech Emotion Recognition (SER) is becoming necessary for interactive spoken dialogue systems as users are expecting empathy from computers. Recent work has shown the importance of approaching this problem from a multimodal perspective, with models that combine visual, acoustic, and lexical features performing better than models based on single modalities. However, current SER models are not robust to out of domain data, partly due to the fact that emotion labeled corpora are generally small. This paper outlines my PhD research plan that aims to improve the SER model by proposing to jointly train with an Automatic Speech Recognition (ASR) model using a novel cross-task semi-supervised learning approach on unlabeled data. The ASR model would be benefit from the training approach and serve as the lexical features provider. This joint ASR-SER model is expected to alleviate the lack of data problem and to be applied in real-life applications such as human-computer interaction and digital health.

Keywords:

Computer science Speech recognition Task (project management) Multimodal learning Modalities Artificial intelligence Natural language processing Perspective (graphical) Machine learning Human–computer interaction

Metrics

Cited By

0.41

FWCI (Field Weighted Citation Impact)

Refs

0.63

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Emotion and Mood Recognition

Social Sciences → Psychology → Experimental and Cognitive Psychology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Semi-Supervised Learning for Multimodal Speech and Emotion Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Enhanced semi-supervised learning for multimodal emotion recognition

Semi-supervised Learning Techniques for Speech Emotion Recognition

Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks

Privacy-preserving Speech Emotion Recognition through Semi-Supervised Federated Learning

Semi-supervised cross-lingual speech emotion recognition