Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

Xiong Wang; Sining Sun; Lei Xie

doi:10.1109/asru46091.2019.9003745

ScienceGate Book Chapters

JOURNAL ARTICLE

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

Xiong Wang Sining Sun Lei Xie

Year: 2019 Journal: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pages: 607-612

DOI: 10.1109/asru46091.2019.9003745

Get Full-Text PDF Get Analytical Report

Abstract

Serving as the tigger of a voice-enabled user interface, on-device keyword spotting model has to be extremely compact, efficient and accurate. In this paper, we adopt a depth-wise separable convolutional neural network (DS-CNN) as our small-footprint KWS model, which is highly competitive to these ends. However, recent study has shown that a compact KWS system is very vulnerable to small adversarial perturbations while augmenting the training data with specifically-generated adversarial examples can improve performance. In this paper, we further improve KWS performance through a virtual adversarial training (VAT) solution. Instead of using adversarial examples for data augmentation, we propose to train a DS-CNN KWS model using adversarial regularization, which aims to smooth model's distribution and thus to improve robustness, by explicitly introducing a distribution smoothness measure into the loss function. Experiments on a collected KWS corpus using a circular microphone array in far-field scenario show that the VAT approach brings 31.9% relative false rejection rate (FRR) reduction compared to the normal training approach with cross entropy loss, and it also surpasses the adversarial example based data augmentation approach with 10.3% relative FRR reduction.

Keywords:

Computer science Adversarial system Keyword spotting Convolutional neural network Artificial intelligence Robustness (evolution) Regularization (linguistics) Speech recognition Pattern recognition (psychology) Machine learning

Metrics

Cited By

0.92

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Virtual Adversarial Training for DS-CNN Based Small-Footprint Keyword Spotting

Abstract

Metrics

Citation History

Topics

Related Documents

Disentangled Training with Adversarial Examples for Robust Small-Footprint Keyword Spotting

Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting

Neural Network-based Small-Footprint Flexible Keyword Spotting

Region Proposal Network Based Small-Footprint Keyword Spotting

Small Footprint Multi-channel Keyword Spotting