DNN Based Expressive Text-to-Speech with Limited Training Data

Sinisa Suzie; Tijana Nosek; Milan Sečujski; Darko Pekar; Vlado Delie

doi:10.1109/telfor48224.2019.8971351

ScienceGate Book Chapters

JOURNAL ARTICLE

DNN Based Expressive Text-to-Speech with Limited Training Data

Sinisa Suzie Tijana Nosek Milan Sečujski Darko Pekar Vlado Delie

Year: 2019 Pages: 1-6

DOI: 10.1109/telfor48224.2019.8971351

Get Full-Text PDF Get Analytical Report

Abstract

Modern text-to-speech synthesis systems should deliver speech which is not just intelligible, but whose style corresponds to the domain in which synthesized speech is used. In this paper three approaches based on deep neural networks aimed at synthesis of expressive speech are presented: style code, model re-training and an architecture using shared hidden layers. Their usability is tested on a speech corpus with a limited amount of expressive speech data. A new architecture for transplanting speech styles is also presented and compared with a referent approach from literature.

Keywords:

Computer science Speech synthesis Speech recognition Referent Natural language processing Usability Artificial intelligence Architecture Style (visual arts) Domain (mathematical analysis) Speech corpus Artificial neural network Linguistics Human–computer interaction

Metrics

Cited By

0.15

FWCI (Field Weighted Citation Impact)

Refs

0.60

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

DNN Based Expressive Text-to-Speech with Limited Training Data

Abstract

Metrics

Citation History

Topics

Related Documents

Accented Text-to-Speech Synthesis With Limited Data

Expressive Text-To-Speech Approaches

Expressive Text-To-Speech Approaches

Sentence-Based Sentiment Analysis for Expressive Text-to-Speech

LLM-Based Expressive Text-to-Speech Synthesizer with Style and Timbre Disentanglement