Building a stochastic language model (LM) for speech recognition requires a large corpus of target tasks. For some tasks no enough large corpus is available and this is an obstacle to achieving high recognition accuracy. In this paper, we propose a methodforbuildinganLMwithahigherpredictionpowerusing large corpora from different tasks rather than an LM estimated from a small corpus for a specific target task. In our experiment, weusedtranscriptionsofairuniversitylecturesandarticlesfrom Nikkei newspaper and compared an existing interpolation-based method and our new method. The results show that our new method reduces perplexity by 9.71%.
Min XiaoFeipeng ZhaoYuhong Guo