Chieko FuruichiKatsura AizawaInoue Kazuhiko
This paper discusses speech recognition based on a new statistical phoneme segment model which is trained by phoneme parameters derived from automatically extracted phoneme segments. The proposed system operates as follows. In preprocessing before recognition, the phoneme boundaries are detected by segmentation. The phonemes are discriminated using a stochastic phoneme segment model, and a phoneme segment lattice with scores is constructed. Next the speech recognition is performed by matching of symbol sequences to dictionary items. The segmentation system that is employed can infer phoneme boundaries with high accuracy. This helps to eliminate unnecessary parameters, leaving the feature parameters which are effective in separating phonemes. In other words, the phoneme recognition problem in continuous speech can be reduced to a discrimination problem and thus a speaker-independent model can be constructed from a relatively small number of training data. The stochastic phoneme segment model is trained with training samples extracted from a phoneme-balanced word set of 4920 words uttered by 10 speakers. In a recognition experiment with 6709 words uttered by 63 nontraining speakers, a recognition rate of 92.6% was obtained as the average for all speakers, using a word dictionary of 212 words. © 2000 Scripta Technica, Syst Comp Jpn, 31(10): 89–98, 2000
Janez ŽibertNikola PavešićFrance Mihelič