Tone recognition for fluent Mandarin speech has always been a very difficult problem, because the pitch contours vary seriously with the context conditions and the complicated tone behavior is difficult to analyze. A new set of four inter-syllabic features are identified to characterize quantitatively such pitch contour variation with respect to the context conditions. In addition, a robust pitch extraction method is proposed by integrating the adaptive Gabor representation (AGR) and instantaneous frequency amplitude spectrum (IFAS). Experimental results indicate that accurate pitch values can be extracted under various noisy conditions, and the tone recognition accuracy can be improved significantly.
Kai‐Cheng ChangChien‐Chiao Yang