JOURNAL ARTICLE

Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis

Abstract

Previous work on cross-lingual transfer learning in text-tospeech has shown the effectiveness of fine-tuning phonemic representations on small amounts of target language data.In other contexts, phonological features (PFs) have been suggested as a more suitable input representation than phonemes for sharing acoustic information between languages, for example in multilingual model training or for code-switching synthesis where an utterance may contain words from multiple languages.Starting from a model trained on 14 hours of English, we find that cross-lingual fine-tuning with 15 minutes of German data can produce speech with subjective naturalness ratings comparable to a model trained from scratch on 4 hours of German, using either phonemes or PFs.We also find a modest but statistically significant improvement in naturalness ratings using PFs over phonemes when training from scratch on 4 hours of German.

Keywords:
Naturalness Computer science German Utterance Speech synthesis Natural language processing Speech recognition Transfer of learning Artificial intelligence Representation (politics) Speech corpus Linguistics

Metrics

9
Cited By
1.13
FWCI (Field Weighted Citation Impact)
23
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.