Joint Time-Frequency and Time Domain Learning for Speech Enhancement

Chuanxin Tang; Chong Luo; Zhiyuan Zhao; Wenxuan Xie; Wenjun Zeng

doi:10.24963/ijcai.2020/528

ScienceGate Book Chapters

JOURNAL ARTICLE

Joint Time-Frequency and Time Domain Learning for Speech Enhancement

Chuanxin Tang Chong Luo Zhiyuan Zhao Wenxuan Xie Wenjun Zeng

Year: 2020 Pages: 3816-3822

DOI: 10.24963/ijcai.2020/528

Get Full-Text PDF Get Analytical Report

Abstract

For single-channel speech enhancement, both time-domain and time-frequency-domain methods have their respective pros and cons. In this paper, we present a cross-domain framework named TFT-Net, which takes time-frequency spectrogram as input and produces time-domain waveform as output. Such a framework takes advantage of the knowledge we have about spectrogram and avoids some of the drawbacks that T-F-domain methods have been suffering from. In TFT-Net, we design an innovative dual-path attention block (DAB) to fully exploit correlations along the time and frequency axes. We further discover that a sample-independent DAB (SDAB) achieves a good tradeoff between enhanced speech quality and complexity. Ablation studies show that both the cross-domain design and the SDAB block bring large performance gain. When logarithmic MSE is used as the training criteria, TFT-Net achieves the highest SDR and SSNR among state-of-the-art methods on two major speech enhancement benchmarks.

Keywords:

Spectrogram Computer science Frequency domain Exploit Time domain Speech recognition Domain (mathematical analysis) Artificial intelligence Mathematics Computer vision

Metrics

Cited By

5.76

FWCI (Field Weighted Citation Impact)

Refs

0.97

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Adaptive Filtering Techniques

Physical Sciences → Engineering → Computational Mechanics

Indoor and Outdoor Localization Technologies

Physical Sciences → Engineering → Electrical and Electronic Engineering

Joint Time-Frequency and Time Domain Learning for Speech Enhancement

Abstract

Metrics

Citation History

Topics

Related Documents

Speech preprocessing and enhancement based on joint time domain and time-frequency domain analysis

Joint Time-Domain and Frequency-Domain Progressive Learning for Single-Channel Speech Enhancement and Recognition

Speech enhancement based on joint time-frequency segmentation

Neural speech enhancement in the time-frequency domain

Speech Enhancement Based on Time-Frequency Domain GAN