A Low-latency Convolutional Recurrent Neural Network (L-CRNN) is proposed to reduce the complexity of a Keyword Spotting (KWS) system with high accuracy. The L-CRNN reduces a number of parameters between RNN layer and Full-Connected (FC) layer, which saves at least 1/2 memory for on-hands device compared with Convolutional Recurrent Neural Network (CRNN) depending on the number of FC units. Furthermore, it learns valid deep audio features to classify the keywords and garbage words with high accuracy. Results of experiments on the Google's Speech Commands Datasets show that the L-CRNN achieves 96.17% accuracy with less than 1/4 number of parameters and fewer float operations compared with Convolutional Neural Network (CNN) and CRNN.
Zhou jianlaiJian LiuSong YantaoTiecheng Yu
Amir Mohammad RostamiAli KarimiMohammad Ali Akhaee
Kai LiJason NaylorMichael L. Rossen