Attention Wave-U-Net for Speech Enhancement

Ritwik Giri; Umut Isik; Arvindh Krishnaswamy

doi:10.1109/waspaa.2019.8937186

ScienceGate Book Chapters

JOURNAL ARTICLE

Attention Wave-U-Net for Speech Enhancement

Ritwik Giri Umut Isik Arvindh Krishnaswamy

Year: 2019

DOI: 10.1109/waspaa.2019.8937186

Get Full-Text PDF Get Analytical Report

Abstract

We propose a novel application of an attention mechanism in neural speech enhancement, by presenting a U-Net architecture with attention mechanism, which processes the raw waveform directly, and is trained end-to-end. We find that the inclusion of the attention mechanism significantly improves the performance of the model in terms of the objective speech quality metrics, and outperforms all other published speech enhancement approaches on the Voice Bank Corpus (VCTK) dataset. We observe that the final layer attention mask has an interpretation as a soft Voice Activity Detector (VAD). We also present some initial results to show the efficacy of the proposed system as a pre-processing step to speech recognition systems.

Keywords:

Computer science Speech enhancement Speech recognition Waveform Voice activity detection Speech processing Mechanism (biology) Artificial neural network Layer (electronics) Artificial intelligence Telecommunications

Metrics

111

Cited By

10.51

FWCI (Field Weighted Citation Impact)

Refs

0.99

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Attention Wave-U-Net for Speech Enhancement

Abstract

Metrics

Citation History

Topics

Related Documents

A Cross-Channel Attention-Based Wave-U-Net for Multi-Channel Speech Enhancement

Speech Enhancement Algorithm Based on Wave-U-Net

Monaural speech enhancement through deep wave-U-net

Real Time Speech Enhancement Using Triple Attention U-Net

Shuffle Attention U-Net for Speech Enhancement in Time Domain