Single channel speech enhancement using time-frequency attention mechanism based nested U-net model

Anil Kumar Prathipati; A.S.N. Chakravarthy

doi:10.1088/2631-8695/ad5e36

ScienceGate Book Chapters

JOURNAL ARTICLE

Single channel speech enhancement using time-frequency attention mechanism based nested U-net model

Anil Kumar Prathipati A.S.N. Chakravarthy

Year: 2024 Journal: Engineering Research Express Vol: 6 (3)Pages: 035206-035206 Publisher: IOP Publishing

DOI: 10.1088/2631-8695/ad5e36

Get Full-Text PDF Get Analytical Report

Abstract

Abstract Deep-learning models have used attention mechanisms to improve quality and intelligibility of noisy speech, demonstrating the effectiveness of attention mechanisms. We rely on either spatial or temporal-based attention mechanisms, resulting in severe information loss. In this paper, a time-frequency attention mechanism with a nested U-network (TFANUNet) is proposed for single-channel speech enhancement. By using TFA, learns which channel, frequency and time information is more significant for speech enhancement. Basically, the proposed model is an encoder-decoder model, where each layer in the encoder and decoder is followed by a nested dense residual dilated DensNet (NDRD) based multi-scale context aggression block. NDRD involves multiple dilated convolutions with different dilatation factors to explore the large receptive area at different scales simultaneously. NDRD avoids the aliasing problem in DenseNet. We integrated the TFA and NDRD blocks into the proposed model to enable refined feature set extraction without information loss and utterance-level context aggregation, respectively. Under seen and unseen noise conditions, the proposed TFAD3MNet model produces an average of 87.02% and 85.04% of STOI values, and 3.19 and 3.01 averaged PESQ values. The trainable parameters of proposed model are 2.09 million,which is very less compared to baselines. TFANUNet model results outperform baselines in terms of STOI and PESQ.

Keywords:

Channel (broadcasting) Mechanism (biology) Computer science Net (polyhedron) Speech recognition Telecommunications Physics Mathematics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Adaptive Filtering Techniques

Physical Sciences → Engineering → Computational Mechanics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Single channel speech enhancement using time-frequency attention mechanism based nested U-net model

Abstract

Metrics

Topics

Related Documents

Speech enhancement using nested U-net with time frequency attention and D3 net

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Stacked U-Net with Time–Frequency Attention and Deep Connection Net for Single Channel Speech Enhancement

A Nested U-Net with Efficient Channel Attention and D3Net for Speech Enhancement

Real Time Speech Enhancement Using Triple Attention U-Net