Predicting F0 and voicing from NAM-captured whispered speech

Viet-Anh Tran; Gérard Bailly; Hélène Lœvenbruck; Tomoki Toda

doi:10.21437/speechprosody.2008-25

ScienceGate Book Chapters

JOURNAL ARTICLE

Predicting F0 and voicing from NAM-captured whispered speech

Viet-Anh Tran Gérard Bailly Hélène Lœvenbruck Tomoki Toda

Year: 2008 Pages: 107-110

DOI: 10.21437/speechprosody.2008-25

Get Full-Text PDF Get Analytical Report

Abstract

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F 0 of the transformed voice from unvoiced speech.In this paper, we propose a method to improve F 0 estimation and voicing decision in a NAM-to-speech conversion system based on Gaussian Mixture Models (GMM) applied to whispered speech.Instead of combining voicing decision and F 0 estimation in a single GMM, a simple feed-forward neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments.The error rate for the voiced/unvoiced decision of the network is 6.8% compared to 9.2% with the original system.Our proposal benefits also to F 0 estimation error.

Keywords:

Voice Speech recognition Computer science

Metrics

Cited By

2.00

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Predicting F0 and voicing from NAM-captured whispered speech

Abstract

Metrics

Citation History

Topics

Related Documents

Artificial voicing of whispered speech

Perception of final-consonant “voicing” in whispered speech.

Context Analysis for Voicing Decision in Whispered Speech

Attributes Associated with Consonantal Place and Voicing in Whispered Speech

Final consonant voicing and vowel height contrasts in whispered speech.