This paper discussed the application of Densely Connected Convolutional Networks (DenseNet), group convolution, and squeeze-and-excitation Networks (SENet) in keyword spotting tasks. We validated the network using the Google Speech Commands Dataset. Our proposed network has better accuracy than other networks even with less number of parameters and floating-point operations (FLOPs). In addition, we varied the depth and width of the network to build a compact variant network. It also outperforms other compact variants.
Tara N. SainathCarolina Parada
Guoguo ChenCarolina ParadaGeorg Heigold
Nayyer AafaqMehran SaleemJahanzeb Tariq KhanImran Hafeez Abbasi