Abstract

We propose a novel application of an attention mechanism in neural speech enhancement, by presenting a U-Net architecture with attention mechanism, which processes the raw waveform directly, and is trained end-to-end. We find that the inclusion of the attention mechanism significantly improves the performance of the model in terms of the objective speech quality metrics, and outperforms all other published speech enhancement approaches on the Voice Bank Corpus (VCTK) dataset. We observe that the final layer attention mask has an interpretation as a soft Voice Activity Detector (VAD). We also present some initial results to show the efficacy of the proposed system as a pre-processing step to speech recognition systems.

Keywords:
Computer science Speech enhancement Speech recognition Waveform Voice activity detection Speech processing Mechanism (biology) Artificial neural network Layer (electronics) Artificial intelligence Telecommunications

Metrics

111
Cited By
10.51
FWCI (Field Weighted Citation Impact)
34
Refs
0.99
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.