Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network

Neil Shah; Hemant A. Patil; Meet Soni

doi:10.23919/apsipa.2018.8659692

ScienceGate Book Chapters

JOURNAL ARTICLE

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network

Neil Shah Hemant A. Patil Meet Soni

Year: 2018 Pages: 1246-1251

DOI: 10.23919/apsipa.2018.8659692

Get Full-Text PDF Get Analytical Report

Abstract

Speech Enhancement (SE) system deals with improving the perceptual quality and preserving the speech intelligibility of the noisy mixture. The Time-Frequency (T-F) masking-based SE using the supervised learning algorithm, such as a Deep Neural Network (DNN), has outperformed the traditional SE techniques. However, the notable difference observed between the oracle mask and the predicted mask, motivates us to explore different deep learning architectures. In this paper, we propose to use a Convolutional Neural Network (CNN)-based Generative Adversarial Network (GAN) for inherent mask estimation. GAN takes an advantage of the adversarial optimization, an alternative to the other Maximum Likelihood (ML) optimization-based architectures. We also show the need for supervised T-F mask estimation for effective noise suppression. Experimental results demonstrate that the proposed T-F mask-based SE significantly outperforms the recently proposed end-to-end SEGAN and a GAN-based Pix2Pix architecture. The performance evaluation in terms of both the predicted mask and the objective measures, dictates the improvement in the speech quality, while simultaneously reducing the speech distortion observed in the noisy mixture.

Keywords:

Computer science Speech enhancement Convolutional neural network Speech recognition Intelligibility (philosophy) Artificial intelligence Oracle Deep learning Deep neural networks Pattern recognition (psychology) Noise reduction

Metrics

Cited By

3.61

FWCI (Field Weighted Citation Impact)

Refs

0.93

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Image and Signal Denoising Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network

Abstract

Metrics

Citation History

Topics

Related Documents

Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network

Time-Frequency Masking-based Speech Enhancement using Generative Adversarial Network

Speech Enhancement Using Generative Adversarial Network (GAN)

GSC Based Speech Enhancement with Generative Adversarial Network

Speech Enhancement Method Based on Generative Adversarial Network and Convolutional Block Attention Module