Duration modeling for hindi text-to-speech synthesis system

Sridhar Krishna Nemala; Partha Talukdar; Kalika Bali; A. G. Ramakrishnan

doi:10.21437/interspeech.2004-297

ScienceGate Book Chapters

JOURNAL ARTICLE

Duration modeling for hindi text-to-speech synthesis system

Sridhar Krishna Nemala Partha Talukdar Kalika Bali A. G. Ramakrishnan

Year: 2004 Pages: 789-792

DOI: 10.21437/interspeech.2004-297

Get Full-Text PDF Get Analytical Report

Abstract

This paper reports preliminary results of data-driven modeling of segmental (phoneme) duration for Hindi. Classification and Regression Tree (CART) based datadriven duration modeling for segmental duration prediction is presented. A number of features are considered and their usefulness and relative contribution for segmental duration prediction is assessed. Objective evaluation of the duration model, by root mean squared prediction error (RMSE) and correlation between actual and predicted durations, is performed.

Keywords:

Duration (music) Computer science Mean squared error Artificial intelligence Hindi Regression Speech recognition Data modeling Statistics Natural language processing Pattern recognition (psychology) Mathematics

Metrics

Cited By

1.16

FWCI (Field Weighted Citation Impact)

Refs

0.81

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Duration modeling for hindi text-to-speech synthesis system

Abstract

Metrics

Citation History

Topics

Related Documents

Duration modeling for arabic text to speech synthesis

Review on Text-to-Speech Synthesis System for Hindi Language

Modeling segmental duration in German text-to-speech synthesis

Modeling vowel duration for Japanese text-to-speech synthesis

Modeling segmental duration in German text-to-speech synthesis