Tone modeling is a critical component for Mandarin large-vocabulary continuous-speech recognition systems. This paper presents an efficient real-time pitch tracker and a set of tone features that achieve a vast 30% reduction of the character error rate (CER), compared to the non-tonal baseline. To our knowledge, this is the highest improvement from tones ever reported for Mandarin. The paper first discusses adapting a known pitch-tracking algorithm for real-time operation. Second, we study the derivation of tone features for Mandarin LVCSR. Compared to the baseline vector (F/sub 0/, /spl Delta/F/sub 0/), our best tone features lead to a 28% reduction of tone errors. Results are shown for three LVCSR databases, including the Chinese 1998 National Performance Assessment (Project 863) and the Taiwan telephony database "MAT." Performance of Western-language systems is reached, and for the "863 System Performance Test," our system achieves 1.5% CER.
Neville RyantJiahong YuanMark Liberman