This paper presents a method to improve the mispronunciation detection performance for low-resource acoustic model. The 1h speech data is randomly selected from CU-CHLOE to imitate the low-resource non-native English situation. The Tandem feature derived from articulatory based Multi-Layer Perception (MLP) is employed to replace the traditional spectral feature (e.g. PLP). Further, motivated by similar pronunciation characteristics between Chinese speaking English and Mandarin, the Mandarin speech data is used to assist in training the multilingual articulatory MLPs. The Tandem feature is also combined with PLP to improve the performance. Finally, the phone recognition correctness (CORR) is improved by 3.84%, and the diagnosis accuracy (DA) is improved by 2.25% with the proposed method.
Hua YuanXu JiJunhong ZhaoJia Liu
Priyanka ChhabraShailja ChhillarRiya TanwarMuskan VermaGaurav Indra
Lakshani NissankaBanuka AthuraliyaSahan Priyanayana