JOURNAL ARTICLE

Analysis of duration prediction accuracy in HMM-based speech synthesis

Abstract

Appropriate phoneme durations are essential for high quality speech synthesis.In hidden Markov model-based text-tospeech (HMM-TTS), durations are typically modeled statistically using state duration probability distributions and duration prediction for unseen contexts.Use of rich context features enables synthesis without high-level linguistic knowledge.In this paper we analyze the accuracy of state duration modeling against phone duration modeling using simple prediction techniques.In addition to the decision tree-based techniques, regression techniques for rich context features with high collinearity are discussed and evaluated.

Keywords:
Hidden Markov model Computer science Duration (music) Speech recognition Speech synthesis Artificial intelligence Natural language processing Pattern recognition (psychology) Acoustics

Metrics

2
Cited By
0.00
FWCI (Field Weighted Citation Impact)
14
Refs
0.47
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.