JOURNAL ARTICLE

Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer

Weiqi JiangChengli SunFeilong ChenYan LengGuo QJiayi SunJiankun Peng

Year: 2023 Journal:   Electronics Vol: 12 (6)Pages: 1330-1330   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

In recent years, Transformer has shown great performance in speech enhancement by applying multi-head self-attention to capture long-term dependencies effectively. However, the computation of Transformer is quadratic with the input speech spectrograms, which makes it computationally expensive for practical use. In this paper, we propose a low complexity hierarchical frame-level Swin Transformer network (FLSTN) for speech enhancement. FLSTN takes several consecutive frames as a local window and restricts self-attention within it, reducing the complexity to linear with spectrogram size. A shifted window mechanism enhances information exchange between adjacent windows, so that window-based local attention becomes disguised global attention. The hierarchical structure allows FLSTN to learn speech features at different scales. Moreover, we designed the band merging layer and the band expanding layer for decreasing and increasing the spatial resolution of feature maps, respectively. We tested FLSTN on both 16 kHz wide-band speech and 48 kHz full-band speech. Experimental results demonstrate that FLSTN can handle speech with different bandwidths well. With very few multiply–accumulate operations (MACs), FLSTN not only has a significant advantage in computational complexity but also achieves comparable objective speech quality metrics with current state-of-the-art (SOTA) models.

Keywords:
Computer science Spectrogram Transformer Speech enhancement Computational complexity theory Speech recognition Computation Artificial intelligence Algorithm Voltage Engineering

Metrics

10
Cited By
2.68
FWCI (Field Weighted Citation Impact)
36
Refs
0.88
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Adaptive Filtering Techniques
Physical Sciences →  Engineering →  Computational Mechanics
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Speech Semantic Communication Based on Swin Transformer

Ziliang ZhouShilian ZhengJie ChenZhijin ZhaoXiaoniu Yang

Journal:   IEEE Transactions on Cognitive Communications and Networking Year: 2023 Vol: 10 (3)Pages: 756-768
JOURNAL ARTICLE

Speech Emotion Recognition Based on Swin-Transformer

Zirou LiaoShaoping Shen

Journal:   Journal of Physics Conference Series Year: 2023 Vol: 2508 (1)Pages: 012056-012056
BOOK-CHAPTER

Swin Transformer Based on Image Enhancement Algorithm

Liwei ChenGulinazi AilimujiangZhichuang Zhao

Advances in transdisciplinary engineering Year: 2024
© 2026 ScienceGate Book Chapters — All rights reserved.