The traditional speech enhancement algorithms have the problems of overfitting, low generalization and performance degradation in non-stationary noise environments. This paper investigates a speech enhancement algorithm based on the Wave-U-Net architecture. The algorithm operates directly in the time domain using an end-to-end learning approach, allowing for integrated modeling of phase information and repeatedly resampling feature maps to calculate and combine features at different time scales. Experiments show that the enhanced speech signal of Wave-U-Net network model outperforms the deep neural network (DNN) model and the traditional Wiener filtering algorithm overall in terms of perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) speech evaluation metrics, and the algorithm has better robustness.
Ritwik GiriUmut IsikArvindh Krishnaswamy
Heitor R. GuimarãesHitoshi NaganoDiego W. Silva
Mohamed Nabih AliAlessio BruttiDaniele Falavigna