Cross-lingual Text-to-Speech with Prosody Embedding

Tijana Nosek; Siniša Suzić; Vlado Delić; Milan Sečujski

doi:10.1109/iwssip58668.2023.10180259

ScienceGate Book Chapters

JOURNAL ARTICLE

Cross-lingual Text-to-Speech with Prosody Embedding

Tijana Nosek Siniša Suzić Vlado Delić Milan Sečujski

Year: 2023 Vol: i Pages: 1-5

DOI: 10.1109/iwssip58668.2023.10180259

Get Full-Text PDF Get Analytical Report

Abstract

The research presented in the paper handles the problem of multilingual text-to-speech, particularly its capability of synthesis of speech when the appropriate combination of desired properties (speaker, language, speaking style) is missing from the training corpus. The model proposed in the paper achieves cross-lingual speech synthesis through the use of neural network embeddings, applied not only to speaker and speaking style IDs, but also to context-dependent phonemes and a range of prosodic events, including accents and phrase breaks. This allows the model to efficiently capture relationships between phones and prosodic events in different languages, and consequently to synthesize speech in the voice of a person who has never spoken the target language or used a target style. The proposed model was trained on speech corpora of American English and Serbo-Croatian. A range of experiments including subjective evaluation of synthesis was carried out to establish both the quality of synthesis in different scenarios and under different conditions, as well as the similarity of speaker voices between cross-lingual and original language scenario.

Keywords:

Computer science Prosody Speech synthesis Natural language processing Speech recognition Chinese speech synthesis Style (visual arts) Context (archaeology) Speech corpus Similarity (geometry) Phrase Artificial intelligence

Metrics

Cited By

0.51

FWCI (Field Weighted Citation Impact)

Refs

0.65

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and dialogue systems

Physical Sciences → Computer Science → Artificial Intelligence

Cross-lingual Text-to-Speech with Prosody Embedding

Abstract

Metrics

Citation History

Topics

Related Documents

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Cross-Lingual Speech-to-Text Summarization

Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding

Cross-Lingual Korean Speech-to-Text Summarization

Description-Based Controllable Text-to-Speech With Cross-Lingual Voice Control