Kanishka RaoFuchun PengFrançoise Beaufays
Pronunciations for words are a critical component in an automated speech recognition system (ASR) as mis-recognitions may be caused by missing or inaccurate pronunciations. The need for high quality pronunciations has recently motivated data-driven techniques to generate them [1]. We propose a data-driven and language-independent framework for verification of such pronunciations to further improve the lexicon quality in ASR. New candidate pronunciations are verified by re-recognizing historical audio logs and examining the associated recognition costs. We build an additional pronunciation quality feature from word and pronunciation frequencies in logs. A machine learned classifier trained on these features achieves nearly 90% accuracy in labeling good vs bad pronunciations across all languages we tested. New pronunciations verified as good may be added to a dictionary, while bad pronunciations may be discarded or sent to experts for further evaluation. We simultaneously verify 5,000 to 30,000 new pronunciations within a few hours and show improvements in the ASR performance as a result of including pronunciations verified by this system.
Rohit SharmaAmey MenonSahil KhairnarSanket BhagatShubham Rawat
Shannon McCrocklinJohn M. Levis
Minjia LiuXiujuan ChenYiling MoZejia ChenXiaobin LiuMeiting HanMeihui LiManfei Xu
Meihui LiMeiting HanZejia ChenYiling MoXiujuan ChenXiaobin Liu