Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech

Yixuan Zhang; Heming Wang; DeLiang Wang

doi:10.21437/interspeech.2022-11156

ScienceGate Book Chapters

JOURNAL ARTICLE

Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech

Yixuan Zhang Heming Wang DeLiang Wang

Year: 2022 Journal: Interspeech 2022 Vol: 2022 Pages: 401-405

DOI: 10.21437/interspeech.2022-11156

Get Full-Text PDF Get Analytical Report

Abstract

Estimating fundamental frequency ( F 0 ) from an audio signal is a necessary step in many tasks such as speech synthesis and speech analysis. Although high estimation accuracy has been achieved for clean speech, it is still challenging for F 0 estimation to handle noisy speech, mainly because of the corruption of harmonic structure caused by noise. In this paper, we view F 0 estimation as a multi-class classification problem and train a frequency-domain densely-connected convolutional neural network (DC-CRN) to estimate F 0 from noisy speech. The proposed model significantly outperforms baseline methods in terms of detection rate. We find that using complex short-time Fourier transform (STFT) as input produces better performance compared to using magnitude STFT as input. Furthermore, we explore improving F 0 estimation with speech enhancement. Although the F 0 estimation model trained on clean speech performs well on enhanced speech, the distortion introduced by the speech enhancement model limits the estimation performance. We propose a cascade model which consists of two modules that optimize enhanced speech and estimated F 0 in turn. Experimental results show that the cascade model brings further improvements to the DC-CRN model, especially in low signal-to-noise ratio (SNR) conditions.

Keywords:

Computer science Speech recognition Fundamental frequency Estimation Artificial intelligence Acoustics Engineering Physics

Metrics

Cited By

0.42

FWCI (Field Weighted Citation Impact)

Refs

0.52

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech

Abstract

Metrics

Citation History

Topics

Related Documents

DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation

Lightweight residual densely connected convolutional neural network

Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation

Densely Connected Network with Time-frequency Dilated Convolution for Speech Enhancement

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction