Biophysically-inspired single-channel speech enhancement in the time domain

Wen, Chuan; Verhulst, Sarah

doi:10.5281/zenodo.10496194

ScienceGate Book Chapters

JOURNAL ARTICLE

Biophysically-inspired single-channel speech enhancement in the time domain

Wen, Chuan Verhulst, Sarah

Year: 2023 Journal: Zenodo (CERN European Organization for Nuclear Research) Publisher: European Organization for Nuclear Research

DOI: 10.5281/zenodo.10496194

Get Full-Text PDF Get Analytical Report

Abstract

Deep neural networks (DNN) based speech enhancement approaches have recently achieved great performance. There are a numerous applications that benefit from the speech enhancement model, including automatic speech recognition (ASR) and hearing aids. The majority of these previous methods were developed in the time-frequency (T-F) domain. However, the T-F domain approach has some limitations, including a high minimum delay in reconstructing the signal from T-F domain representation, poor generalizability in unseen noise, and bad performance for negative signal-to-noise ratio’s (SNRs). To address these problems, we propose a biophysically inspired end-to-end time-domain neural network, which adopts bio-inspired features from CoNNear, a neural network that accurately simulates biophysical properties of the human auditory system such as sharp and, level-dependent filter tuning. We first generated biophysical speech feature using CoNNear were subsequently fed into the U-Net-based speech enhancement module. The latter module consisted of generator network without the discriminator from SERGAN (Baby and Verhulst, 2019) and the training dataset we use was INTERSPEECH 2021 DNS Challenge dataset. An objective evaluation was performed using perceptual evaluation of speech quality (PESQ), segmental SNR (segSNR), cepstral distance (CD) and log-likelihood ratio (LLR) with unseen samples from DNS challenge, which is different from training noise scenarios. Results of objective evaluation reveal that bio-inspired features show comparable performance with T-F features at positive SNRs, with improved generalizability in negative SNR and for mismatched noises. Additionally, our time-domain CoNNear features dramatically decreased the minimum latency of the whole system towards 4ms, making it suitable for real-time applications with high constraints on signal delay. The good generalizability in adverse noise conditions and unseen noise, as well as the low latency of our DNN-based model ensure the promising applicability in hearing aids.

Keywords:

Speech enhancement Generalizability theory Discriminator PESQ Cepstrum Artificial neural network Time domain Feature (linguistics) Filter (signal processing) SIGNAL (programming language)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.35

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Hearing Loss and Rehabilitation

Life Sciences → Neuroscience → Cognitive Neuroscience

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Biophysically-inspired single-channel speech enhancement in the time domain

Abstract

Metrics

Topics

Related Documents

Biophysically-inspired single-channel speech enhancement in the time domain

Single-Channel Speech Enhancement in the Time Domain

Single-Channel Speech Enhancement in the Time Domain

A Decoupled Biophysically-Inspired Architecture for Speech Enhancement

Single-Channel Speech Enhancement in Spherical-Mapped Short-Time Spectral Domain