Analysis of duration prediction accuracy in HMM-based speech synthesis

Hanna Silén; Elina Helander; Jani Nurminen; Moncef Gabbouj

doi:10.21437/speechprosody.2010-79

ScienceGate Book Chapters

JOURNAL ARTICLE

Analysis of duration prediction accuracy in HMM-based speech synthesis

Hanna Silén Elina Helander Jani Nurminen Moncef Gabbouj

Year: 2010 Pages: paper 510-0

DOI: 10.21437/speechprosody.2010-79

Get Full-Text PDF Get Analytical Report

Abstract

Appropriate phoneme durations are essential for high quality speech synthesis.In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts.Use of rich context features enables synthesis without high-level linguistic knowledge.In this paper we analyze the accuracy of state duration modeling against phone duration modeling using simple prediction techniques.In addition to the decision tree-based techniques, regression techniques for rich context features with high collinearity are discussed and evaluated.

Keywords:

Hidden Markov model Computer science Duration (music) Speech recognition Speech synthesis Artificial intelligence Natural language processing Pattern recognition (psychology) Acoustics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.47

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Analysis of duration prediction accuracy in HMM-based speech synthesis

Abstract

Metrics

Citation History

Topics

Related Documents

Duration modeling for HMM-based speech synthesis

State Duration Modeling for HMM-Based Speech Synthesis

Full covariance state duration modeling for HMM-based speech synthesis

Analysis of HMM-based lombard speech synthesis

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis