JOURNAL ARTICLE

Mongolian emotional speech synthesis based on transfer learning and emotional embedding

Abstract

In recent years, end-to-end speech synthesis based on attention has achieved better performance than traditional speech synthesis models, and the technology of end-to-end Mongolian speech synthesis has reached the application standard. However, due to the sparse training corpus, the research on Mongolian emotional speech synthesis is still far from perfect. In response to these problems, we established a Mongolian emotional corpus and constructed an emotionally controllable Mongolian speech synthesis system for the first time. Through combining transfer learning and emotional embedding, the Mongolian emotional speech synthesis system with 8 kinds of emotions (happy, angry, sadness, surprise, fear, disgust, boredom and neutral) has been achieved. We proposed the method that emotional labels are used as the input of the emotional embedding layer to generate emotional vectors, which are spliced with the output vectors of the bidirectional LSTM layer, so that the text representation vectors contain information about emotional category, thereby synthesize a variety of different emotional voices. Experiments show that our method can synthesize high-quality Mongolian emotional speech.

Keywords:
Disgust Surprise Speech synthesis Sadness Computer science Speech recognition Embedding Boredom Anger Artificial intelligence Natural language processing Psychology Communication

Metrics

7
Cited By
0.99
FWCI (Field Weighted Citation Impact)
28
Refs
0.81
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Mongolian Emotional Speech Synthesis Based on CGAN and Improved FastSpeech2

Qing-Dao-Er-Ji RenYang YangLele Wang

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2025 Vol: 24 (9)Pages: 1-16
JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. LeAn T. LeQuang H. Nguyen

Journal:   Greater South Information System Year: 2023
JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. LeAn T. LeQuang H. Nguyen

Journal:   Computer Systems Science and Engineering Year: 2022 Vol: 44 (2)Pages: 1263-1278
JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. LeAn T. LeQuang H. Nguyen

Journal:   Greater South Information System Year: 2023
© 2026 ScienceGate Book Chapters — All rights reserved.