Mongolian emotional speech synthesis based on transfer learning and emotional embedding

Aihong Huang; Feilong Bao; Guanglai Gao; Shan Yu; Rui Liu

doi:10.1109/ialp54817.2021.9675192

ScienceGate Book Chapters

JOURNAL ARTICLE

Mongolian emotional speech synthesis based on transfer learning and emotional embedding

Aihong Huang Feilong Bao Guanglai Gao Shan Yu Rui Liu

Year: 2021 Vol: 80 Pages: 78-83

DOI: 10.1109/ialp54817.2021.9675192

Get Full-Text PDF Get Analytical Report

Abstract

In recent years, end-to-end speech synthesis based on attention has achieved better performance than traditional speech synthesis models, and the technology of end-to-end Mongolian speech synthesis has reached the application standard. However, due to the sparse training corpus, the research on Mongolian emotional speech synthesis is still far from perfect. In response to these problems, we established a Mongolian emotional corpus and constructed an emotionally controllable Mongolian speech synthesis system for the first time. Through combining transfer learning and emotional embedding, the Mongolian emotional speech synthesis system with 8 kinds of emotions (happy, angry, sadness, surprise, fear, disgust, boredom and neutral) has been achieved. We proposed the method that emotional labels are used as the input of the emotional embedding layer to generate emotional vectors, which are spliced with the output vectors of the bidirectional LSTM layer, so that the text representation vectors contain information about emotional category, thereby synthesize a variety of different emotional voices. Experiments show that our method can synthesize high-quality Mongolian emotional speech.

Keywords:

Disgust Surprise Speech synthesis Sadness Computer science Speech recognition Embedding Boredom Anger Artificial intelligence Natural language processing Psychology Communication

Metrics

Cited By

0.99

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Mongolian emotional speech synthesis based on transfer learning and emotional embedding

Abstract

Metrics

Citation History

Topics

Related Documents

Mongolian Emotional Speech Synthesis Based on CGAN and Improved FastSpeech2

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

SRC-IT2: Speech Rate-Controllable Mongolian Emotional Speech Synthesis Based on Improved Tacotron2