JOURNAL ARTICLE

Shuffle Attention U-Net for Speech Enhancement in Time Domain

Chaitanya JannuSunny Dayal Vanambathina

Year: 2023 Journal:   International Journal of Image and Graphics Vol: 24 (04)   Publisher: World Scientific

Abstract

Over the past 10 years, deep learning has enabled significant advancements in the improvement of noisy speech. In an end-to-end speech enhancement, the deep neural networks transform a noisy speech signal to a clean speech signal in the time domain directly without any conversion or estimation of mask. Recently, the U-Net-based models achieved good enhancement performance. Despite this, some of them may neglect context-related information and detailed features of input speech in case of ordinary convolution. To address the above issues, recent studies have upgraded the performance of the model by adding various network modules such as attention mechanisms, long and short-term memory (LSTM). In this work, we propose a new U-Net-based speech enhancement model using a novel lightweight and efficient Shuffle Attention (SA), Gated Recurrent Unit (GRU), residual blocks with dilated convolutions. Residual block will be followed by a multi-scale convolution block (MSCB). The proposed hybrid structure enables the temporal context aggregation in time domain. The advantage of shuffle attention mechanism is that the channel and spatial attention are carried out simultaneously for each sub-feature in order to prevent potential noises while also highlighting the proper semantic feature areas by combining the same features from all locations. MSCB is employed for extracting rich temporal features. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of short-time objective intelligibility (STOI), and perceptual evaluation of the speech quality (PESQ).

Keywords:
Computer science Speech enhancement Residual Speech recognition Context (archaeology) Block (permutation group theory) Convolution (computer science) Deep learning Artificial intelligence Noise (video) Feature (linguistics) Bottleneck Artificial neural network Pattern recognition (psychology) Algorithm Noise reduction Image (mathematics) Mathematics

Metrics

12
Cited By
3.22
FWCI (Field Weighted Citation Impact)
33
Refs
0.90
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Hearing Loss and Rehabilitation
Life Sciences →  Neuroscience →  Cognitive Neuroscience
© 2026 ScienceGate Book Chapters — All rights reserved.