In this paper we introduce a simplified architecture for gated recurrent neural networks that can be used in single-pass applications, where word-spotting needs to be done in real-time and phoneme-level information is not available for training. The network operates as a self-contained block in a strictly forward-pass configuration to directly generate keyword labels. We call these simple networks causal networks, where the current output is only weighted by the the past inputs and outputs. Since the basic network has a simpler architecture as compared to traditional memory networks used in keyword spotting, it also requires less data to train. Experiments on a standard speech database highlight the behavior and efficacy of such networks. Comparisons with a standard HMM-based keyword spotter show that these networks, while simple, are still more accurate.
J. Alvarez-CercadilloLuis A. Hernández Gómez
Volkmar FrinkenAndreas FischerR. ManmathaHorst Bunke
Wasiq KhanPing JiangRobert Holton