Yuhong GuoLi TaYujing SiJielin PanYonghong Yan
Speech recognition decoder is an important part of large vocabulary speech recognition application. The speed and the accuracy is the main concern of its application. Recently, weighted finite state transducers (WFST) has become the dominant description of decoding network. However, the large memory and time cost of constructing the final WFST decoding network is the bottleneck of this technique. The goal of this article is to construct a tight, flexible WFST decoding network as well as a fast, scalable decoder. A tight representation of silence in speech is proposed and the decoding algorithm with improved pruning strategies is also suggested. The experimental results show that the proposed network presentation will cut off 37% memory cost and 19% time cost of constructing the final decoding network. And with the decoding strategies of WFST feature specified beams the proposed decoder's efficiency and accuracy are also significantly improved.
Paul R. DixonDiamantino CaseiroTasuku OonishiSadaoki Furui
Diamantino CaseiroIsabel Trancoso
Askars SalimbajevsJurgita Kapočiūtė-Dzikienė
Shinji WatanabeTakaaki HoriAtsushi Nakamura
Yotaro KuboTakaaki HoriAtsushi Nakamura