TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting

Dong Liu; Qirong Mao; Lijian Gao; Qinghua Ren; Zhenghan Chen; Ming Dong

doi:10.1145/3581783.3612173

ScienceGate Book Chapters

JOURNAL ARTICLE

TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting

Dong Liu Qirong Mao Lijian Gao Qinghua Ren Zhenghan Chen Ming Dong

Year: 2023 Pages: 601-610

DOI: 10.1145/3581783.3612173

Get Full-Text PDF Get Analytical Report

Abstract

Keyword spotting (KWS) presents a formidable challenge, particularly in high-noise environments. Traditional denoising algorithms that rely solely on speech have difficulty recovering speech that has been severely corrupted by noise. In this investigation, we develop an adaptive text-informed denoising model to bolster reliable keyword identification in the presence of considerable noise degradation. The whole proposed TE-KWS incorporates a tripartite branch structure, where the speech branch (SB) takes noisy speech as input which provides the raw speech information, the alignment branch (AB) accommodates aligned text input which facilitates accurate restoration of the corresponding speech when text with alignment is preserved, and the text branch (TB) handles unaligned text which prompts the model to autonomously learn the alignment between speech and text. To make the proposed denoising model more beneficial for KWS, following the training of the whole model,the alignment branch (AB) is frozen, and the model is fine-tuned by leveraging its speech restoration and forced alignment capabilities. Subsequently, the input for the text branch (TB) is supplanted with designated keywords, and a heavier denoising penalty is applied on the keywords period, thereby explicitly intensifying the speech restoration ability of the model for keywords. Finally, the Combined Adversarial Domain Adaptation (CADA) is implemented to enhance the robustness of KWS with regard to data pre-and post-speech enhancement (SE). Experimental results indicate that our approach not only markedly ameliorates highly corrupted speech, achieving SOTA performance for marginally corrupted speech, but also bolsters the efficacy and generalizability of prevailing mainstream KWS models.

Keywords:

Keyword spotting Computer science Speech recognition Robustness (evolution) Speech enhancement Noise reduction Noise (video) PESQ Artificial intelligence Speech processing

Metrics

Cited By

0.27

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting

Abstract

Metrics

Citation History

Topics

Related Documents

NTC-KWS: Noise-aware CTC for Robust Keyword Spotting

DCCRN-KWS: An Audio Bias Based Model for Noise Robust Small-Footprint Keyword Spotting

Hybrid context dependent CD-DNN-HMM Keyword Spotting (KWS) in speech conversations

Robust Keyword Spotting for Noisy Environments by Leveraging Speech Enhancement and Speech Presence Probability

ICFHR2016 Handwritten Keyword Spotting Competition (H-KWS 2016)