JOURNAL ARTICLE

Cross-Lingual Speaker Adaptation for HMM-based Speech Synthesis

Abstract

This paper explores a cross-lingual speaker adaptation technique for HMM-based speech synthesis, where a source voice model for English is transformed into a target speaker model using Mandarin Chinese speech data from the target speaker.A phone mappingbased method is adopted to map Chinese Initial/Finals into English phonemes and two types of mapping rules, including one-to-one and one-to-sequence mappings, are compared.In order to avoid having to map prosodic features between languages, the adaptation procedure uses regression classes and transforms that are constructed for triphone models, then used to adapt the phonetic-and-prosodiccontext-dependent models.From the experimental results, we found that a one-to-sequence phone mapping is better than a one-to-one mapping, and that the similarity between adapted English speech and target Chinese speaker is reasonable.

Keywords:
Speech recognition Computer science Hidden Markov model Mandarin Chinese Phone Speech synthesis Similarity (geometry) Sequence (biology) Speaker diarisation Adaptation (eye) Artificial intelligence Natural language processing Speaker recognition Linguistics Image (mathematics)

Metrics

7
Cited By
1.20
FWCI (Field Weighted Citation Impact)
17
Refs
0.89
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.