Grapheme-to-phoneme (G2P) conversion is an important problem for many speech and language processing applications. G2P models are particularly useful for low-resource languages that do not have well-developed pronunciation lexicons. Prominent G2P paradigms are based on initial alignments between grapheme and phoneme sequences. In this work, we devise new alignment strategies that work effectively with recurrent neural network based models when only a small number of pronunciations are available to train the models. In a small data setting, we build G2P models for Pashto, Tagalog and Lithuanian that significantly outperform a joint sequence model and a baseline recurrent neural network based model, giving up to 14% and 9% relative reductions in phone and word error rates when trained on a dataset of 250 words.
Sevinj YolchuyevaGéza NémethBálint Gyires-Tóth
James RouteSteven HillisIsak Czeresnia EtingerHan ZhangAlan W. Black
Kazufumi OmuraTetsuji TsukamotoYasunori KotaniYoshimi OhgamiKohki Yoshikawa
Ben PetersJon DehdariJosef van Genabith