Kelan KuangFeiran YangJunfeng LiJun Yang
This paper proposes a hybrid neural beamformer for multi-channel speech enhancement, which comprises three stages, i.e., beamforming, post-filtering, and distortion compensation, called TriU-Net. The TriU-Net first estimates a set of masks to be used within a minimum variance distortionless response beamformer. A deep neural network (DNN)-based post-filter is then utilized to suppress the residual noise. Finally, a DNN-based distortion compensator is followed to further improve speech quality. To characterize the long-range temporal dependencies more efficiently, a network topology, gated convolutional attention network, is proposed and utilized in the TriU-Net. The advantage of the proposed model is that the speech distortion compensation is explicitly considered, yielding higher speech quality and intelligibility. The proposed model achieved an average 2.854 wb-PESQ score and 92.57% ESTOI on the CHiME-3 dataset. In addition, extensive experiments conducted on the synthetic data and real recordings confirm the effectiveness of the proposed method in noisy reverberant environments.
Jing BaiHao LiXueliang ZhangFei Chen
Ruqiao LiuYi ZhouHongqing LiuXinmeng XuJie JiaBinbin Chen
Andong LiGuochen YuChengshi ZhengXiaodong Li
Nidal AbuhajarZhewei WangMarc BaltesYe YueLi XuAvinash KaranthCharles D. SmithJundong Liu