The goal of speech enhancement is to improve the quality and intelligibility of noisy speeches. One solution for this task is Fully Convolutional Network (FCN), which can effectively model temporal structure with less parameters. However, deep FCN is hard to be trained and may suffer the loss of detailed information due to consecutive pooling operations. To solve this problem, we introduce two different shortcut mechanisms to better preserve information from shallow layers. In addition, we discard the pooling layers and corresponding upsampling layers of FCN to avoid information compression. Experimental results show that the proposed model can achieve higher performance compared to other baselines in both seen and unseen noise conditions.
D. M. BalochSidrah AbdullahAsma QaiserSaad AhmedFaiza NasimMehreen Kanwal
Zezheng XuTing JiangChao LiJiacheng Yu
Kaikun PeiLijun ZhangDejian MengWei TianZhuang ZhangJianfeng Wu