JOURNAL ARTICLE

TE-KWS: Text-Informed Speech Enhancement for Noise-Robust Keyword Spotting

Abstract

Keyword spotting (KWS) presents a formidable challenge, particularly in high-noise environments. Traditional denoising algorithms that rely solely on speech have difficulty recovering speech that has been severely corrupted by noise. In this investigation, we develop an adaptive text-informed denoising model to bolster reliable keyword identification in the presence of considerable noise degradation. The whole proposed TE-KWS incorporates a tripartite branch structure, where the speech branch (SB) takes noisy speech as input which provides the raw speech information, the alignment branch (AB) accommodates aligned text input which facilitates accurate restoration of the corresponding speech when text with alignment is preserved, and the text branch (TB) handles unaligned text which prompts the model to autonomously learn the alignment between speech and text. To make the proposed denoising model more beneficial for KWS, following the training of the whole model,the alignment branch (AB) is frozen, and the model is fine-tuned by leveraging its speech restoration and forced alignment capabilities. Subsequently, the input for the text branch (TB) is supplanted with designated keywords, and a heavier denoising penalty is applied on the keywords period, thereby explicitly intensifying the speech restoration ability of the model for keywords. Finally, the Combined Adversarial Domain Adaptation (CADA) is implemented to enhance the robustness of KWS with regard to data pre-and post-speech enhancement (SE). Experimental results indicate that our approach not only markedly ameliorates highly corrupted speech, achieving SOTA performance for marginally corrupted speech, but also bolsters the efficacy and generalizability of prevailing mainstream KWS models.

Keywords:
Keyword spotting Computer science Speech recognition Robustness (evolution) Speech enhancement Noise reduction Noise (video) PESQ Artificial intelligence Speech processing

Metrics

1
Cited By
0.27
FWCI (Field Weighted Citation Impact)
57
Refs
0.47
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.