JOURNAL ARTICLE

Densely-connected Convolutional Recurrent Network for Fundamental Frequency Estimation in Noisy Speech

Abstract

Estimating fundamental frequency ( F 0 ) from an audio signal is a necessary step in many tasks such as speech synthesis and speech analysis. Although high estimation accuracy has been achieved for clean speech, it is still challenging for F 0 estimation to handle noisy speech, mainly because of the corruption of harmonic structure caused by noise. In this paper, we view F 0 estimation as a multi-class classification problem and train a frequency-domain densely-connected convolutional neural network (DC-CRN) to estimate F 0 from noisy speech. The proposed model significantly outperforms baseline methods in terms of detection rate. We find that using complex short-time Fourier transform (STFT) as input produces better performance compared to using magnitude STFT as input. Furthermore, we explore improving F 0 estimation with speech enhancement. Although the F 0 estimation model trained on clean speech performs well on enhanced speech, the distortion introduced by the speech enhancement model limits the estimation performance. We propose a cascade model which consists of two modules that optimize enhanced speech and estimated F 0 in turn. Experimental results show that the cascade model brings further improvements to the DC-CRN model, especially in low signal-to-noise ratio (SNR) conditions.

Keywords:
Computer science Speech recognition Fundamental frequency Estimation Artificial intelligence Acoustics Engineering Physics

Metrics

3
Cited By
0.42
FWCI (Field Weighted Citation Impact)
0
Refs
0.52
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Phonetics and Phonology Research
Social Sciences →  Psychology →  Experimental and Cognitive Psychology

Related Documents

JOURNAL ARTICLE

DRC-NET: Densely Connected Recurrent Convolutional Neural Network for Speech Dereverberation

Jinjiang LiuXueliang Zhang

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 166-170
JOURNAL ARTICLE

Lightweight residual densely connected convolutional neural network

Fahimeh FooladgarShohreh Kasaei

Journal:   Multimedia Tools and Applications Year: 2020 Vol: 79 (35-36)Pages: 25571-25588
JOURNAL ARTICLE

Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation

Anderson QueirozR. Coelho

Journal:   IEEE/ACM Transactions on Audio Speech and Language Processing Year: 2022 Vol: 30 Pages: 2504-2513
JOURNAL ARTICLE

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation and Extraction

Jiangyu HanYanhua LongLukáš BurgetJaň Černocký

Journal:   ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Year: 2022 Pages: 7292-7296
© 2026 ScienceGate Book Chapters — All rights reserved.