Mandarin Prosody Boundary Prediction based on Sequence-to-sequence Model

Yajing Yan; Jiaolong Jiang; Hongwu Yang

doi:10.1109/itnec48623.2020.9084900

ScienceGate Book Chapters

JOURNAL ARTICLE

Mandarin Prosody Boundary Prediction based on Sequence-to-sequence Model

Yajing Yan Jiaolong Jiang Hongwu Yang

Year: 2020 Journal: 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) Pages: 1013-1017

DOI: 10.1109/itnec48623.2020.9084900

Get Full-Text PDF Get Analytical Report

Abstract

The prediction of prosodic structure of sentences is the key for improving the naturalness of Mandarin speech synthesis. In this paper, we proposed a sequence-to-sequence (seq2seq) model-based method to improve the predictive accuracy of the prosodic boundaries from Chinese sentence. A large-scale text corpus including 100,000 Chinese sentences is collected that is manually labelled the part-of-speech and the boundaries of the prosodic words and prosodic phrases under the guidance of a linguistic expert. By analyzing the text corpus, the shallow features such as part-of-speech, word length and word embedding are selected as the input features of the seq2seq model. At the same time, a new deep feature named syntactic hierarchical number (SHN) is proposed to predict the boundary of prosodic phrases, which describes the relationship between syntactic structure and prosodic structure. Finally, we get the seq2seq model by training the labelled text corpus to predict the boundaries of prosodic words and prosodic phrases. The experimental results show that the seq2seq model achieves F1-score of 97.15% in prosodic word and 82.98% in prosodic phrase boundary prediction. Compared to the other models, our proposed method are more effective on the prediction of prosodic structure, which can be applied to the front-end of speech synthesis.

Keywords:

Prosody Computer science Natural language processing Artificial intelligence Phrase Mandarin Chinese Speech recognition Sequence (biology) Naturalness Sentence Speech synthesis Word (group theory) Feature (linguistics) Linguistics

Metrics

Cited By

0.66

FWCI (Field Weighted Citation Impact)

Refs

0.73

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Mandarin Prosody Boundary Prediction based on Sequence-to-sequence Model

Abstract

Metrics

Citation History

Topics

Related Documents

Sequence-to-Sequence Predictive Model: From Prosody to Communicative Gestures

Body posture prediction based on the sequence to sequence model

Time Series Data Prediction Based on Sequence to Sequence Model

MANDARIN PROSODY BOUNDARY PREDICTION FOR IMPROVING MANDARIN LEARNING OF NON-NATIVE SPEAKERS

Applying Syntax-Prosody Mapping Hypothesis and Boundary-Driven Theory to Neural Sequence-to-Sequence Speech Synthesis