JOURNAL ARTICLE

Multi-Task Based Mispronunciation Detection of Children Speech Using Multi-Lingual Information

Abstract

In developing a Computer-Aided Pronunciation Training (CAPT) system for Chinese ESL (English as a Second Language) children, we suffered from insufficient task-specific data. To address this issue, we propose to utilize first language (L1) and second language (L2) knowledge from both adult and children data through multitask-based transfer learning according to Speech Learning Model (SLM). Experimental set-up includes the TDNN acoustic modelling using the following training data: 70 hours of English speech by American Children (AC), 100 hours by American Adults (AA), 5 hours of Chinese speech by Chinese Children (CC), and 89 hours by Chinese Adults (CA). Testing data includes 2 hours of ESL speech by Chinese children. Experimental results showed that the inclusion of AA data brought about 13% relative Detection Error Rate (DER) reduction compared to AC only. Further inclusion of CC and CA data through L1 transfer learning brought about a total of 21% relative improvement in DER. These results suggested the proposed method is effective in mitigating insufficient data problem.

Keywords:
Pronunciation Computer science Task (project management) Speech recognition Transfer of learning Training set Set (abstract data type) Task analysis Speech processing Artificial intelligence Natural language processing Linguistics

Metrics

2
Cited By
0.15
FWCI (Field Weighted Citation Impact)
29
Refs
0.61
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Phonetics and Phonology Research
Social Sciences →  Psychology →  Experimental and Cognitive Psychology
© 2026 ScienceGate Book Chapters — All rights reserved.