A hybrid method oriented to concatenative text-to-speech synthesis

Ignasi Iriondo; Francesc Álías; Javier Sanchis; Javier Melenchón

doi:10.21437/eurospeech.2003-593

ScienceGate Book Chapters

JOURNAL ARTICLE

A hybrid method oriented to concatenative text-to-speech synthesis

Ignasi Iriondo Francesc Álías Javier Sanchis Javier Melenchón

Year: 2003 Pages: 2953-2956

DOI: 10.21437/eurospeech.2003-593

Get Full-Text PDF Get Analytical Report

Abstract

In this paper we present a speech synthesis method for diphonebased text-to-speech systems. Its main goal is to achieve\nprosodic modifications that result in more natural-sounding synthetic speech. This improvement is especially useful for emotional speech synthesis, which requires high-quality prosodic modification. We present a hybrid method based on TD-PSOLA and the harmonic plus noise model, which incorporates a novel method to jointly modify pitch and time-scale. Preliminary results show an improvement in the synthetic speech quality when high pitch modification is required.

Keywords:

Computer science Speech synthesis Speech recognition Natural language processing Artificial intelligence

Metrics

Cited By

0.38

FWCI (Field Weighted Citation Impact)

Refs

0.76

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

A hybrid method oriented to concatenative text-to-speech synthesis

Abstract

Metrics

Topics

Related Documents

HMM and Concatenative Synthesis based Text-to-Speech Synthesis

An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis Systems

A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units

INDONESIAN TEXT-TO-SPEECH SYSTEM USING DIPHONE CONCATENATIVE SYNTHESIS

Affective word ratings for concatenative text-to-speech synthesis