DCT based densely connected convolutional GRU for real-time speech enhancement

Chaitanya Jannu; Sunny Dayal Vanambathina

doi:10.3233/jifs-223951

ScienceGate Book Chapters

JOURNAL ARTICLE

DCT based densely connected convolutional GRU for real-time speech enhancement

Chaitanya Jannu Sunny Dayal Vanambathina

Year: 2023 Journal: Journal of Intelligent & Fuzzy Systems Vol: 45 (1)Pages: 1195-1208 Publisher: IOS Press

DOI: 10.3233/jifs-223951

Get Full-Text PDF Get Analytical Report

Abstract

Over the past ten years, deep learning has enabled significant advancements in the improvement of noisy speech. Due to the short time stability of speech signal, previous speech enhancement (SE) methods concentrated only on magnitude estimation, and these methods added a phase of the mixture in reconstructing the speech. The performance is limited in these approaches since the phase will also carry some of the speech information. Some of the speech enhancement approaches were developed later to jointly estimate both magnitudes as well as phases. Recently, complex-valued models, like deep complex convolution recurrent network (DCCRN), are proposed, but the computation of the model is very huge. In this work, we propose a Discrete Cosine Transform-based Densely Connected Convolutional Gated Recurrent Unit (DCTDCCGRU) model using dilated dense block and stacked GRU. The dense connectivity strengthens the gradient propagation by concatenating features from previous layers at the input. The advantage of the dense block is that at various resolutions, the dilated convolutions aid with context aggregation, and the dense connectivity provides a feature map with more precise target information by passing through multiple layers. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), PESQ (perceptual evaluation of the speech quality), and output SNR (signal-to-noise ratio).

Keywords:

PESQ Computer science Speech enhancement Speech recognition Intelligibility (philosophy) Discrete cosine transform Computation Convolution (computer science) Deep learning Artificial intelligence Algorithm Noise reduction Pattern recognition (psychology)

Metrics

Cited By

2.42

FWCI (Field Weighted Citation Impact)

Refs

0.87

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Advanced Adaptive Filtering Techniques

Physical Sciences → Engineering → Computational Mechanics

DCT based densely connected convolutional GRU for real-time speech enhancement

Abstract

Metrics

Citation History

Topics

Related Documents

Real time speech enhancement using densely connected neural networks and Squeezed temporal convolutional modules

An attention based densely connected U-NET with convolutional GRU for speech enhancement

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement

Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain

An NMF-based MMSE Approach for Single Channel Speech Enhancement Using Densely Connected Convolutional Network