JOURNAL ARTICLE

Cross-Lingual Speaker Adaptation for HMM-Based Speech Synthesis

Abstract

This paper explores a cross-lingual speaker adaptation technique for HMM-based speech synthesis, where a source voice model for English is transformed into a target speaker model using Mandarin Chinese speech data from the target speaker. A phone mapping- based method is adopted to map Chinese Initial/Finals into English phonemes and two types of mapping rules, including one-to-one and one-to-sequence mappings, are compared. In order to avoid having to map prosodic features between languages, the adaptation procedure uses regression classes and transforms that are constructed for triphone models, then used to adapt the phonetic-and-prosodic- context-dependent models. From the experimental results, we found that a one-to-sequence phone mapping is better than a one-to-one mapping, and that the similarity between adapted English speech and target Chinese speaker is reasonable.

Keywords:
Computer science Speech recognition Hidden Markov model Mandarin Chinese Speech synthesis Phone Similarity (geometry) Speaker diarisation Sequence (biology) Context (archaeology) Adaptation (eye) Artificial intelligence Natural language processing Speaker recognition Linguistics Image (mathematics)

Metrics

42
Cited By
6.39
FWCI (Field Weighted Citation Impact)
18
Refs
0.98
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.