Statistical parametric speech synthesis using deep neural networks

Heiga Zen; Andrew Senior; Mike Schuster

doi:10.1109/icassp.2013.6639215

ScienceGate Book Chapters

JOURNAL ARTICLE

Statistical parametric speech synthesis using deep neural networks

Heiga Zen Andrew Senior Mike Schuster

Year: 2013 Pages: 7962-7966

DOI: 10.1109/icassp.2013.6639215

Get Full-Text PDF Get Analytical Report

Abstract

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

Keywords:

Hidden Markov model Computer science Parametric statistics Artificial neural network Speech recognition Context (archaeology) Decision tree Waveform Deep neural networks Speech synthesis Parametric model Artificial intelligence Pattern recognition (psychology) Mathematics Statistics

Metrics

825

Cited By

84.87

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Statistical parametric speech synthesis using deep neural networks

Abstract

Metrics

Citation History

Topics

Related Documents

VOICE SOURCE MODELLING USING DEEP NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS

Voice Source Modelling Using Deep Neural Networks For Statistical Parametric Speech Synthesis

Deep Elman recurrent neural networks for statistical parametric speech synthesis

Multiple feed-forward deep neural networks for statistical parametric speech synthesis

Statistical Parametric Speech Synthesis for Punjabi Language using Deep Neural Network