Due to the fast-paced nature of technology, people focus on keyword spotting technology for the use of human-computer interaction (HCI). In this paper, a keyword spotting technique based on the Convolutional neural network (CNN) method is proposed. This network model is modified with the densely connected convolutional network (DenseNet) and uses grouped convolution and deep separable convolution to construct complete keyword spotting tasks. Besides, we change the width and depth of the network to construct a compact variation of the network. We established the network using the Google Speech Command Dataset V2. Compared to different networks, our proposed network sacrifices a small quantity of precision to have a low number of parameters and floating-point operations (FLOPs).
Guoqing LiMeng ZhangJiaojie LiFeng LvGuodong Tong
Wenhan LiWenqing XieZhifang Wang
Miguel AngrickChristian HerffEmily M. MuglerMatthew C. TateMarc W. SlutzkyDean J. KrusienskiTanja Schultz
Amir Mohammad RostamiAli KarimiMohammad Ali Akhaee