With the increase in biometric security applications, mobile and telephonic communication monitoring and digital assistants, the practical applications of Keyword Spotting (KWS) have increased many folds. The use of Artificial Intelligence in the domain of Keyword Spotting has greatly enhanced its accuracy. In this work, after doing analysis of various feature extraction and Deep Learning techniques, KWS is done both in non-streaming mode and streaming mode. The features of the speech are extracted using Mel-Spectograms and Mel-frequency Cepstral Coefficients (MFCCs). Out of three broad categories of Deep Neural networks, Convolutional Neural Network (CNN) model has been implemented for Keyword Spotting as it out-performs Recurrent Neural Network (RNN) and Feedforward Neural Network (FFNN) due to their lesser complexity and low computational cost. These techniques were used with Google Speech Commands Dataset, provided by Google, online as well as offline.
Tara N. SainathCarolina Parada
Guoguo ChenCarolina ParadaGeorg Heigold