JOURNAL ARTICLE

Incorporating tone features to convolutional neural network to improve Mandarin/Thai speech recognition

Abstract

Tone plays an important role in distinguishing lexical meaning in tonal languages, such as Mandarin and Thai. It has been revealed that tone information is helpful to improve automatic speech recognition (ASR) for these languages. In this study, we incorporate tone features from the fundamental frequency (Fo) and fundamental frequency variation (FFV) to the convolutional neural network (CNN), a state-of-the-art acoustic modeling approach, for acoustic modeling of the ASR systems. Due to its abilities of reducing spectral variations and modeling spectral correlations existing in speech signals, the CNN is expected to model well tone patterns which mainly behave in the frequency domain, by Fo contur. We conduct speech ASR experiments on Mandarin and Thai to evaluate the effectivenesses of the proposed approaches. With the help of tone features, the character error rates (CERs) of Mandarin achieve 4.3-7.1% relative reductions, and the word error rates (WERs) of Thai achieve 0.41-6.26% relative reductions. The CNN shows its clear superiority to the deep neural network (DNN), with relative CER reductions of 5.4-13.1% for Mandarin, and relative WER reductions of 0.5-5.6% for Thai.

Keywords:
Mandarin Chinese Speech recognition Tone (literature) Computer science Convolutional neural network Artificial intelligence Natural language processing Linguistics

Metrics

24
Cited By
2.45
FWCI (Field Weighted Citation Impact)
35
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.