Most computer assisted language learning (CALL) systems use acoustic models trained by MLE (Maximum Likelihood Estimation) for pronunciation proficiency evaluation. However, MLE ignores information of other phones during training stage and cant distinguish confusing phones well. This paper introduced discriminative measures of minimum phone/word error to refine acoustic models to deal with the problem. This paper analyzed discriminative trained acoustic models on Putonghua proficiency test in detail and found that: 1) They are much more distinguishable than MLE ones; 2) Even though the training and test are mismatch, they still perform significantly better than MLE-trained models under the same phone boundaries. The final system performance has approximately 4.5% relative improvement.
Oh Pyo KweonMotoyuki SuzukiAkinori ItoShozo Makino