JOURNAL ARTICLE

Pitch tracking and tone features for Mandarin speech recognition

Abstract

Tone modeling is a critical component for Mandarin large-vocabulary continuous-speech recognition systems. This paper presents an efficient real-time pitch tracker and a set of tone features that achieve a vast 30% reduction of the character error rate (CER), compared to the non-tonal baseline. To our knowledge, this is the highest improvement from tones ever reported for Mandarin. The paper first discusses adapting a known pitch-tracking algorithm for real-time operation. Second, we study the derivation of tone features for Mandarin LVCSR. Compared to the baseline vector (F/sub 0/, /spl Delta/F/sub 0/), our best tone features lead to a 28% reduction of tone errors. Results are shown for three LVCSR databases, including the Chinese 1998 National Performance Assessment (Project 863) and the Taiwan telephony database "MAT." Performance of Western-language systems is reached, and for the "863 System Performance Test," our system achieves 1.5% CER.

Keywords:
Mandarin Chinese Tone (literature) Speech recognition Computer science Word error rate Vocabulary Baseline (sea) Telephony Character (mathematics) Artificial intelligence Mathematics Telecommunications

Metrics

59
Cited By
7.06
FWCI (Field Weighted Citation Impact)
19
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Data Compression Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.