Keikichi HiroseHui HuXiaodong WangNobuaki Minematsu
A method is developed for recognizing lexical tone types of Standard Chinese syllables in continuous speech.Neural network (four-layered perceptron) is adopted as classifier.The method includes two steps; first recognizing tone types using prosodic features of voiced part, and then re-recognizing by viewing only on tone nucleus, which is a portion of the syllable showing rather stable fundamental frequency (F 0 ) contour regardless of tone types of the preceding and following syllables.The voiced part (or tone nucleus) is divided into 20 segments, and F 0 , delta-F 0 , F 0 slope and short-term energy of each segment are served as inputs to the neural network.In order to cope with tone coarticulation, prosodic feature parameters for the last 5 segments of the preceding syllable and the initial 5 segments of the following syllable are included in the neural network inputs.Information on syllable length is also added to the inputs.Tone recognition experiment was conducted for a female speaker's utterances included in HKU96 corpus.The average recognition rate was 86.5 % including neutral tone syllables, when the tone nucleus model was not used.It increased to 86.9 %, when the model was used.The obtained rate is higher by more than 3 points as compared to that obtained by the hidden-Markov-model-based tone recognizer developed by the authors formerly.
Xiangli WangKenji HiroseJinkai ZhangNobuaki Minematsu