Shuffle Attention U-Net for Speech Enhancement in Time Domain

Chaitanya Jannu; Sunny Dayal Vanambathina

doi:10.1142/s0219467824500438

ScienceGate Book Chapters

JOURNAL ARTICLE

Shuffle Attention U-Net for Speech Enhancement in Time Domain

Chaitanya Jannu Sunny Dayal Vanambathina

Year: 2023 Journal: International Journal of Image and Graphics Vol: 24 (04) Publisher: World Scientific

DOI: 10.1142/s0219467824500438

Get Full-Text PDF Get Analytical Report

Abstract

Over the past 10 years, deep learning has enabled significant advancements in the improvement of noisy speech. In an end-to-end speech enhancement, the deep neural networks transform a noisy speech signal to a clean speech signal in the time domain directly without any conversion or estimation of mask. Recently, the U-Net-based models achieved good enhancement performance. Despite this, some of them may neglect context-related information and detailed features of input speech in case of ordinary convolution. To address the above issues, recent studies have upgraded the performance of the model by adding various network modules such as attention mechanisms, long and short-term memory (LSTM). In this work, we propose a new U-Net-based speech enhancement model using a novel lightweight and efficient Shuffle Attention (SA), Gated Recurrent Unit (GRU), residual blocks with dilated convolutions. Residual block will be followed by a multi-scale convolution block (MSCB). The proposed hybrid structure enables the temporal context aggregation in time domain. The advantage of shuffle attention mechanism is that the channel and spatial attention are carried out simultaneously for each sub-feature in order to prevent potential noises while also highlighting the proper semantic feature areas by combining the same features from all locations. MSCB is employed for extracting rich temporal features. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of short-time objective intelligibility (STOI), and perceptual evaluation of the speech quality (PESQ).

Keywords:

Computer science Speech enhancement Residual Speech recognition Context (archaeology) Block (permutation group theory) Convolution (computer science) Deep learning Artificial intelligence Noise (video) Feature (linguistics) Bottleneck Artificial neural network Pattern recognition (psychology) Algorithm Noise reduction Image (mathematics) Mathematics

Metrics

Cited By

3.22

FWCI (Field Weighted Citation Impact)

Refs

0.90

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Shuffle Attention U-Net for Speech Enhancement in Time Domain

Abstract

Metrics

Citation History

Topics

Related Documents

DBAUNet: Dual-branch attention U-Net for time-domain speech enhancement

Real Time Speech Enhancement Using Triple Attention U-Net

Attention Wave-U-Net for Speech Enhancement

Speech enhancement using nested U-net with time frequency attention and D3 net

CAUNet: Context-Aware U-Net for Speech Enhancement in Time Domain