Yajing YanJiaolong JiangHongwu Yang
The prediction of prosodic structure of sentences is the key for improving the naturalness of Mandarin speech synthesis. In this paper, we proposed a sequence-to-sequence (seq2seq) model-based method to improve the predictive accuracy of the prosodic boundaries from Chinese sentence. A large-scale text corpus including 100,000 Chinese sentences is collected that is manually labelled the part-of-speech and the boundaries of the prosodic words and prosodic phrases under the guidance of a linguistic expert. By analyzing the text corpus, the shallow features such as part-of-speech, word length and word embedding are selected as the input features of the seq2seq model. At the same time, a new deep feature named syntactic hierarchical number (SHN) is proposed to predict the boundary of prosodic phrases, which describes the relationship between syntactic structure and prosodic structure. Finally, we get the seq2seq model by training the labelled text corpus to predict the boundaries of prosodic words and prosodic phrases. The experimental results show that the seq2seq model achieves F1-score of 97.15% in prosodic word and 82.98% in prosodic phrase boundary prediction. Compared to the other models, our proposed method are more effective on the prediction of prosodic structure, which can be applied to the front-end of speech synthesis.
Fajrian YunusChloé ClavelCatherine Pélachaud
Chao YangZhongwen GuoLintao Xian
Hongwu YangYajing YanJiaolong Jiang
Kei FurukawaTakeshi KishiyamaSatoshi NakamuraSakriani Sakti