Systematic review of feature-based approaches to mispronunciation detection

Lakshani Nissanka; Banuka Athuraliya; Sahan Priyanayana

doi:10.29140/jaltcall.v21n3.102732

ScienceGate Book Chapters

JOURNAL ARTICLE

Systematic review of feature-based approaches to mispronunciation detection

Lakshani Nissanka Banuka Athuraliya Sahan Priyanayana

Year: 2025 Journal: The JALT CALL Journal Vol: 21 (3)Pages: 102732-102732

DOI: 10.29140/jaltcall.v21n3.102732

Get Full-Text PDF Get Analytical Report

Abstract

Accurate pronunciation is essential for successful communication in a second language (L2) as it significantly influences communicative effectiveness and perceived fluency. Mispronunciations frequently arise due to the influence of the learner’s first language (L1), posing barriers to effective spoken communication. Therefore, pronunciation error detection (PED) has emerged as a critical research area within the domains of Computer-Assisted Language Learning (CALL) and Computer-Assisted Pronunciation Training (CAPT). Although numerous PED systems have been developed over recent decades, existing survey papers have mainly emphasized comparisons of modeling methodologies or learning paradigms, often neglecting the critical role of feature representation. To address this research gap, this survey introduces a novel, feature-based taxonomy for categorizing PED methodologies into four primary groups: Acoustic-based, Acoustic-Phonetic, Linguistic-based, and Hybrid approaches. Each category is systematically reviewed, summarizing over two decades of research work with respect to feature extraction techniques, modeling approaches, evaluation metrics, and the nature and quality of instructional feedback provided to learners. A detailed comparative analysis highlights significant trade-offs among these categories in terms of detection accuracy, interpretability, resource demands, and applicability in real-time or low-resource contexts. Furthermore, this survey discusses recent and emerging trends in PED research, including self-supervised learning frameworks, multimodal feature fusion, and integrating phonological knowledge with modern deep learning architectures. By synthesizing existing knowledge and identifying gaps in current methodologies, this paper aims to provide clear insights and directions for future advancements in PED systems.

Keywords:

Pronunciation Feature (linguistics) Resource (disambiguation) Taxonomy (biology) Quality (philosophy) Feature extraction Spoken language

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.89

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

Voice and Speech Disorders

Health Sciences → Medicine → Physiology

Systematic review of feature-based approaches to mispronunciation detection

Abstract

Metrics

Topics

Related Documents

Mispronunciation Detection Using Feature Learning

Improve mispronunciation detection with Tandem feature

Correlational Neural Network Based Feature Adaptation in L2 Mispronunciation Detection

Feature-Based Fault Detection Approaches

Mispronunciation Detection