JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Abstract

In recent years, speech synthesis systems have allowed for the production of very high-quality voices.Therefore, research in this domain is now turning to the problem of integrating emotions into speech.However, the method of constructing a speech synthesizer for each emotion has some limitations.First, this method often requires an emotional-speech data set with many sentences.Such data sets are very time-intensive and labor-intensive to complete.Second, training each of these models requires computers with large computational capabilities and a lot of effort and time for model tuning.In addition, each model for each emotion failed to take advantage of data sets of other emotions.In this paper, we propose a new method to synthesize emotional speech in which the latent expressions of emotions are learned from a small data set of professional actors through a Flowtron model.In addition, we provide a new method to build a speech corpus that is scalable and whose quality is easy to control.Next, to produce a high-quality speech synthesis model, we used this data set to train the Tacotron 2 model.We used it as a pre-trained model to train the Flowtron model.We applied this method to synthesize Vietnamese speech with sadness and happiness.Mean opinion score (MOS) assessment results show that MOS is 3.61 for sadness and 3.95 for happiness.In conclusion, the proposed method proves to be more effective for a high degree of automation and fast emotional sentence generation, using a small emotional-speech data set.

Keywords:
Sadness Speech synthesis Vietnamese Set (abstract data type) Mean opinion score Sentence Speech processing Fluency Training set Speech corpus

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.37
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Emotion and Mood Recognition
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
Mental Health via Writing
Social Sciences →  Psychology →  Social Psychology
Sentiment Analysis and Opinion Mining
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. LeAn T. LeQuang H. Nguyen

Journal:   Computer Systems Science and Engineering Year: 2022 Vol: 44 (2)Pages: 1263-1278
JOURNAL ARTICLE

Emotional Vietnamese Speech Synthesis Using Style-Transfer Learning

Thanh X. LeAn T. LeQuang H. Nguyen

Journal:   Greater South Information System Year: 2023
JOURNAL ARTICLE

Vietnamese Speech Synthesis Based on Transfer Learning

YANG Lin, YANG Jian, CAI Haoran, LIU Cong

Journal:   DOAJ (DOAJ: Directory of Open Access Journals) Year: 2023
© 2026 ScienceGate Book Chapters — All rights reserved.