JOURNAL ARTICLE

TFDense-GAN: a generative adversarial network for single-channel speech enhancement

Haoxiang ChenJinxiu ZhangYanjun FuXintong ZhouRuilong WangYanyan XuDengfeng Ke

Year: 2025 Journal:   EURASIP Journal on Advances in Signal Processing Vol: 2025 (1)   Publisher: Springer Science+Business Media

Abstract

Abstract Research indicates that utilizing the spectrum in the time–frequency domain plays a crucial role in speech enhancement tasks, as it can better extract audio features and reduce computational consumption. For the speech enhancement methods in the time–frequency domain, the introduction of attention mechanisms and the application of DenseBlock have yielded promising results. In particular, the Unet architecture, which comprises three main components, the encoder, the decoder, and the bottleneck, employs DenseBlock in both the encoder and the decoder to achieve powerful feature fusion capabilities with fewer parameters. In this paper, in order to enhance the advantages of the aforementioned methods for speech enhancement, we propose a Unet-based time–frequency domain denoising model called TFDense-Net. It utilizes our improved DenseBlock for feature extraction in both the encoder and the decoder and employs an attention mechanism in the bottleneck for feature fusion and denoising. The model has demonstrated excellent performance for speech enhancement tasks, achieving significant improvements in the Si-SDR metric compared to other state-of-the-art models. Additionally, to further enhance the denoising performance and increase the receptive field of the model, we introduce a multi-spectrogram discriminator based on multiple STFTs. Since the discriminator loss can observe the correlations between spectra that traditional loss functions cannot detect, we train TFDense-Net as a generator against the multi-spectrogram discriminator, resulting in a significant improvement in the denoising performance, and we name this enhanced model TFDense-GAN. We evaluate our proposed TFDense-Net and TFDense-GAN on two public datasets: the VCTK + DEMAND dataset and the Interspeech Deep Noise Suppression Challenge dataset. Experimental results show that TFDense-GAN outperforms most existing models in terms of STOI, PESQ, and Si-SDR, achieving state-of-the-art results. The comparison samples of TFDense-GAN and other models can be accessed from https://github.com/yhsjoker/TFDense-GAN .

Keywords:
Adversarial system Computer science Channel (broadcasting) Generative adversarial network Speech recognition Generative grammar Speech enhancement Telecommunications Artificial intelligence Deep learning Background noise

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
36
Refs
0.08
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

BOOK-CHAPTER

Speech Enhancement Using Generative Adversarial Network (GAN)

Mahmudul HuqRytis Maskeliūnas

Lecture notes in networks and systems Year: 2022 Pages: 273-282
JOURNAL ARTICLE

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

Sidheswar RoutrayQirong Mao

Journal:   Computer Speech & Language Year: 2021 Vol: 71 Pages: 101270-101270
JOURNAL ARTICLE

Noise Classification Speech Enhancement Generative Adversarial Network

Tao FengYe LiPeng ZhangShu LiFuqiang Wang

Journal:   2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC) Year: 2022 Pages: 11-16
© 2026 ScienceGate Book Chapters — All rights reserved.