Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Lujun Li; Wudamu; Ludwig Kürzinger; Tobias Watzel; Gerhard Rigoll

doi:10.3390/app11167564

ScienceGate Book Chapters

JOURNAL ARTICLE

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Lujun Li Wudamu Ludwig Kürzinger Tobias Watzel Gerhard Rigoll

Year: 2021 Journal: Applied Sciences Vol: 11 (16)Pages: 7564-7564 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app11167564

Get Full-Text PDF Get Analytical Report

Abstract

Generative adversarial networks (GANs) have recently garnered significant attention for their use in speech enhancement tasks, in which they generally process and reconstruct speech waveforms directly. Existing GANs for speech enhancement rely solely on the convolution operation, which may not accurately characterize the local information of speech signals—particularly high-frequency components. Sinc convolution has been proposed in order to allow the GAN to learn more meaningful filters in the input layer, and has achieved remarkable success in several speech signal processing tasks. Nevertheless, Sinc convolution for speech enhancement is still an under-explored research direction. This paper proposes Sinc–SEGAN, a novel generative adversarial architecture for speech enhancement, which usefully merges two powerful paradigms: Sinc convolution and the speech enhancement GAN (SEGAN). There are two highlights of the proposed system. First, it works in an end-to-end manner, overcoming the distortion caused by imperfect phase estimation. Second, the system derives a customized filter bank, tuned for the desired application compactly and efficiently. We empirically study the influence of different configurations of Sinc convolution, including the placement of the Sinc convolution layer, length of input signals, number of Sinc filters, and kernel size of Sinc convolution. Moreover, we employ a set of data augmentation techniques in the time domain, which further improve the system performance and its generalization abilities. Compared to competitive baseline systems, Sinc–SEGAN overtakes all of them with drastically reduced system parameters, demonstrating its effectiveness for practical usage, e.g., hearing aid design and cochlear implants. Additionally, data augmentation methods further boost Sinc–SEGAN performance across classic objective evaluation criteria for speech enhancement.

Keywords:

Sinc function Computer science Convolution (computer science) Kernel (algebra) Speech recognition Algorithm Artificial intelligence Mathematics Artificial neural network Computer vision

Metrics

Cited By

0.86

FWCI (Field Weighted Citation Impact)

Refs

0.74

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Advanced Adaptive Filtering Techniques

Physical Sciences → Engineering → Computational Mechanics

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Lightweight End-to-End Speech Enhancement Generative Adversarial Network Using Sinc Convolutions

Abstract

Metrics

Citation History

Topics

Related Documents

Perception-guided generative adversarial network for end-to-end speech enhancement

End-to-end Speech Enhancement Using Self-Attention Generative Adversarial Networks

End-to-end latent fingerprint enhancement using multi-scale Generative Adversarial Network

End-to-End Video-to-Speech Synthesis Using Generative Adversarial Networks

Speech Enhancement Using Generative Adversarial Network (GAN)