JOURNAL ARTICLE

DCT based densely connected convolutional GRU for real-time speech enhancement

Chaitanya JannuSunny Dayal Vanambathina

Year: 2023 Journal:   Journal of Intelligent & Fuzzy Systems Vol: 45 (1)Pages: 1195-1208   Publisher: IOS Press

Abstract

Over the past ten years, deep learning has enabled significant advancements in the improvement of noisy speech. Due to the short time stability of speech signal, previous speech enhancement (SE) methods concentrated only on magnitude estimation, and these methods added a phase of the mixture in reconstructing the speech. The performance is limited in these approaches since the phase will also carry some of the speech information. Some of the speech enhancement approaches were developed later to jointly estimate both magnitudes as well as phases. Recently, complex-valued models, like deep complex convolution recurrent network (DCCRN), are proposed, but the computation of the model is very huge. In this work, we propose a Discrete Cosine Transform-based Densely Connected Convolutional Gated Recurrent Unit (DCTDCCGRU) model using dilated dense block and stacked GRU. The dense connectivity strengthens the gradient propagation by concatenating features from previous layers at the input. The advantage of the dense block is that at various resolutions, the dilated convolutions aid with context aggregation, and the dense connectivity provides a feature map with more precise target information by passing through multiple layers. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), PESQ (perceptual evaluation of the speech quality), and output SNR (signal-to-noise ratio).

Keywords:
PESQ Computer science Speech enhancement Speech recognition Intelligibility (philosophy) Discrete cosine transform Computation Convolution (computer science) Deep learning Artificial intelligence Algorithm Noise reduction Pattern recognition (psychology)

Metrics

9
Cited By
2.42
FWCI (Field Weighted Citation Impact)
15
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Advanced Adaptive Filtering Techniques
Physical Sciences →  Engineering →  Computational Mechanics
© 2026 ScienceGate Book Chapters — All rights reserved.