Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network

Meet Soni; Neil Shah; Hemant A. Patil

doi:10.1109/icassp.2018.8462068

ScienceGate Book Chapters

JOURNAL ARTICLE

Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network

Meet Soni Neil Shah Hemant A. Patil

Year: 2018 Pages: 5039-5043

DOI: 10.1109/icassp.2018.8462068

Get Full-Text PDF Get Analytical Report

Abstract

The success of time-frequency (T-F) mask-based approaches is dependent on the accuracy of predicted mask given the noisy spectral features. The state-of-the-art methods in T- F masking-based enhancement employ Deep Neural Network (DNN) to predict mask. Recently, Generative Adversarial Networks (GAN) are gaining popularity instead of maximum likelihood (ML)-based optimization of deep learning architectures. In this paper, we propose to exploit GAN in T-F masking-based enhancement framework. We present the viable strategy to use GAN in such application by modifying the existing approach. To achieve this, we use a method that learns the mask implicitly while predicting the clean T-F representation. Moreover, we show the failure of vanilla GAN in predicting the accurate mask and propose a regularized objective function with the use of Mean Square Error (MSE) between predicted and target spectrum to overcome it. The objective evaluation of the proposed method shows the improvement in the accurate mask prediction, as against the state-of-the-art ML-based optimization techniques. The proposed system significantly improves over a recent GAN-based speech enhancement system in improving speech quality, while maintaining a better trade-off between less speech distortion and more effective removal of background interferences present in the noisy mixture.

Keywords:

Speech enhancement Computer science Masking (illustration) Mean squared error Distortion (music) Artificial intelligence Speech recognition Artificial neural network Pattern recognition (psychology) Noise reduction Mathematics

Metrics

216

Cited By

21.49

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Time-Frequency Masking-Based Speech Enhancement Using Generative Adversarial Network

Abstract

Metrics

Citation History

Topics

Related Documents

Time-Frequency Masking-based Speech Enhancement using Generative Adversarial Network

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network

Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network

CycleGAN based Speech Enhancement Using Time Frequency Masking

Speech Enhancement Using Generative Adversarial Network (GAN)