EVASS: Emotional Variational End-to-End Speech Synthesis with Semi-Supervised and Adverserial Learning

Mohamed Osman

doi:10.1109/miucc55081.2022.9781718

ScienceGate Book Chapters

JOURNAL ARTICLE

EVASS: Emotional Variational End-to-End Speech Synthesis with Semi-Supervised and Adverserial Learning

Mohamed Osman

Year: 2022 Journal: 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) Vol: 30 Pages: 97-103

DOI: 10.1109/miucc55081.2022.9781718

Get Full-Text PDF Get Analytical Report

Abstract

Communicating one's inner state - their emotions and feelings - forms one of the core principles of social communication and behavior in humans. Emotion is an important component of speech, and its inclusion in synthetic speech will allow for breakthroughs in applications like human-machine interfacing, e-book reading, and voice acting. However, modelling emotions in speech in an end-to-end manner has so far remained an under-explored topic of research. To address this, we experiment with novel methods in global emotional modelling in unsupervised, semi-supervised and adverserial contexts using an end-to-end text-to-speech (TTS) architecture. We condition the latent space, duration prediction and audio generation on novel hybrid labels based on ground truth data – 14 emotion labels, 64 sentiment analysis labels, and speaker labels - which may be inferred from input text during inference. Experiments on conditional discriminators were also performed. The final proposed model produces high quality expressive results comparable to the state of the art.

Keywords:

Computer science End-to-end principle Speech recognition Inference Component (thermodynamics) Speech synthesis Artificial intelligence Reading (process) Interfacing Natural language processing Linguistics

Metrics

Cited By

0.12

FWCI (Field Weighted Citation Impact)

Refs

0.25

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

EVASS: Emotional Variational End-to-End Speech Synthesis with Semi-Supervised and Adverserial Learning

Abstract

Metrics

Citation History

Topics

Related Documents

Semi-Supervised End-to-End Speech Recognition

Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

Improving End-to-End Bangla Speech Recognition with Semi-supervised Training