This paper describes our recent investigation on the use of both intra-syllable and cross-syllable acoustic units for Cantonese text-to-speech synthesis. In our previous work, isolated monosyllable units were used for concatenative speech synthesis of Cantonese. The synthetic speech was considered to be unnatural in such a way that there was an obvious lack of perceptual continuity. The proposed system adopts an acoustic inventory that covers all legitimate intrasyllable and cross-syllable acoustic units. Synthetic speech produced via concatenation of such sub-syllable units better captures the pertinent transitory effects that are crucial to perceived naturalness. Different strategies are used to concatenate speech segments with different acoustic-phonetic properties. Subjective listening test shows a noticeable performance improvement that is accounted for mainly by smoother transition between sonorant segments.
Zhihong HuJohan SchalkwykEtienne BarnardRonald A. Cole
Zhihong HuJohan SchalkwykEtienne BarnardRonald A. Cole
Tan LeeHelen MengWai H. LauWan-Yi LoP. C. Ching