JOURNAL ARTICLE

Semi-Supervised Learning for Multimodal Speech and Emotion Recognition

Abstract

Speech Emotion Recognition (SER) is becoming necessary for interactive spoken dialogue systems as users are expecting empathy from computers. Recent work has shown the importance of approaching this problem from a multimodal perspective, with models that combine visual, acoustic, and lexical features performing better than models based on single modalities. However, current SER models are not robust to out of domain data, partly due to the fact that emotion labeled corpora are generally small. This paper outlines my PhD research plan that aims to improve the SER model by proposing to jointly train with an Automatic Speech Recognition (ASR) model using a novel cross-task semi-supervised learning approach on unlabeled data. The ASR model would be benefit from the training approach and serve as the lexical features provider. This joint ASR-SER model is expected to alleviate the lack of data problem and to be applied in real-life applications such as human-computer interaction and digital health.

Keywords:
Computer science Speech recognition Task (project management) Multimodal learning Modalities Artificial intelligence Natural language processing Perspective (graphical) Machine learning Human–computer interaction

Metrics

2
Cited By
0.41
FWCI (Field Weighted Citation Impact)
47
Refs
0.63
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

Semi-supervised Learning Techniques for Speech Emotion Recognition

K RemyaB. S. Shajee MohanK V Ahammed Muneer

Journal:   Journal of Physics Conference Series Year: 2021 Vol: 1921 (1)Pages: 012029-012029
JOURNAL ARTICLE

Privacy-preserving Speech Emotion Recognition through Semi-Supervised Federated Learning

Vasileios TsouvalasTanır ÖzçelebiNirvana Meratnia

Journal:   2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) Year: 2022 Pages: 359-364
© 2026 ScienceGate Book Chapters — All rights reserved.