Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation

Yi Zhou; Xiaohai Tian; Haizhou Li

doi:10.1109/taslp.2021.3125142

ScienceGate Book Chapters

JOURNAL ARTICLE

Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation

Yi Zhou Xiaohai Tian Haizhou Li

Year: 2021 Journal: IEEE/ACM Transactions on Audio Speech and Language Processing Vol: 29 Pages: 3427-3439 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/taslp.2021.3125142

Get Full-Text PDF Get Analytical Report

Abstract

Cross-lingual personalized speech generation seeks to synthesize a target speakers voice from only a few training samples that are in a different language. One popular technique is to condition a speech synthesizer on a speaker embedding, that characterizes the target speaker. Unfortunately, such a speaker embedding is usually affected by the language being spoken, which compromises the speaker similarity in cross-lingual personalized speech generation. In this paper, we propose a novel speaker encoding mechanism that learns a language agnostic speaker embedding to characterize speaker individuality. Specifically, we adopt an encoder-decoder architecture to disentangle the language information from speaker embeddings via multi-task learning. We conduct experiments on both voice conversion and text-to-speech synthesis between English and Mandarin that involve cross-lingual speech generation. All objective and subjective evaluations consistently confirm that the proposed speaker embedding is language agnostic, thus improving cross-lingual personalized speech generation in terms of speaker similarity.

Keywords:

Computer science Speech recognition Embedding Similarity (geometry) Natural language processing Artificial intelligence Mandarin Chinese Speech synthesis Speaker diarisation Speaker recognition Linguistics Image (mathematics)

Metrics

Cited By

1.98

FWCI (Field Weighted Citation Impact)

111

Refs

0.88

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation

Abstract

Metrics

Citation History

Topics

Related Documents

CrossSpeech++: Cross-Lingual Speech Synthesis With Decoupled Language and Speaker Generation

Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding

DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech

Detecting Hate Speech in Cross-Lingual and Multi-lingual Settings Using Language Agnostic Representations

LAFT: Cross-lingual Transfer for Text Generation by Language-Agnostic Finetuning