In this paper, we describe a pronunciation lexicon model which is especially useful for constructing morpheme-based pronunciation lexicon to improve the performance of a Korean LVCSR. There are a lot of pronunciation variations occurring at morpheme boundaries in continuous speech. For modeling of cross-morpheme pronunciation variations, we usually used a context-dependent multiple pronunciation lexicon with possible multiple phonetic transcriptions for each word. Since phonemic context together with morphological category and morpheme boundary information affect Korean pronunciation variations, we have distinguished phonological rules that can be applied to phonemes in withinmorpheme and cross-morpheme. However, pronunciation variations in morpheme boundaries are increasing the lexicon size; we have designed the optimized pronunciation lexicon which is decreasing the confusability and increasing pronunciation coverage. The results of Korean Broadcast News Transcription experiments show that a reduction of 18% in pronunciation lexicon size and an absolute reduction of 0.27% in WER from the same lexical entries were achieved by building a proposed pronunciation lexicon.
Kyung-Tak LeeShen Shou Max Chung
Liang LuArnab GhoshalSteve Renals
Kristine MaG. ZavaliagkosRukmini Iyer
Mohamed ElmahdyMark Hasegawa‐JohnsonEiman Mustafawi