TFDense-GAN: a generative adversarial network for single-channel speech enhancement

Haoxiang Chen; Jinxiu Zhang; Yanjun Fu; Xintong Zhou; Ruilong Wang; Yanyan Xu; Dengfeng Ke

doi:10.1186/s13634-025-01210-1

ScienceGate Book Chapters

JOURNAL ARTICLE

TFDense-GAN: a generative adversarial network for single-channel speech enhancement

Haoxiang Chen Jinxiu Zhang Yanjun Fu Xintong Zhou Ruilong Wang Yanyan Xu Dengfeng Ke

Year: 2025 Journal: EURASIP Journal on Advances in Signal Processing Vol: 2025 (1) Publisher: Springer Science+Business Media

DOI: 10.1186/s13634-025-01210-1

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Research indicates that utilizing the spectrum in the time–frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time–frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time–frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results. The comparison samples of TFDense-GAN and other models can be accessed from https://github.com/yhsjoker/TFDense-GAN .

Keywords:

Adversarial system Computer science Channel (broadcasting) Generative adversarial network Speech recognition Generative grammar Speech enhancement Telecommunications Artificial intelligence Deep learning Background noise

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.08

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

TFDense-GAN: a generative adversarial network for single-channel speech enhancement

Abstract

Metrics

Topics

Related Documents

Speech Enhancement Using Generative Adversarial Network (GAN)

Enhancement of Alaryngeal Speech using Generative Adversarial Network (GAN)

CP-GAN: Context Pyramid Generative Adversarial Network for Speech Enhancement

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

Noise Classification Speech Enhancement Generative Adversarial Network