Michael F. LallyHeejin KimLori Moon
Dysarthric speech, a motor speech disorder associated with neuro-motor conditions, poses a challenge for automatic speech recognition (ASR) due to its acoustic characteristics and the comparatively limited volume of available data. This study investigates ASR toolkits’ recognition abilities on speech from speakers affected with cerebral palsy (CP) at different intelligibility levels, by comparing them against human listeners’ performance. We ask: (1) how intelligible is speech from CP-affected speakers to ASR toolkits trained on non-dysarthric speech? (2) how does this performance compare to naïve human listeners? and (3) as familiarized human listeners better understand dysarthric speech, to what degree is dysarthric speech more intelligible to similarly familiarized ASR toolkits? Using the UA-Speech Database (Kim et al., 2008), we test the ASR system with two training methods: strong supervision, with both audio and orthography feedback in training, and unsupervised methods, with only audio signals, following Kim and Nanney’s (2014) experiments with human listeners. ASR accuracy is measured by the word error rate in word transcription tests. Findings reveal the extent to which supervision affects ASR models in comparison to human listeners. Implications regarding how to improve the adaptation techniques to dysarthric speech for both ASR and human listeners are presented.
Wing-Zin LeungMattias CrossAnton RagniStefan Goetze
Karbasi, MahdieKolossa, Dorothea
Phil GreenJames CarmichaelAthanassios HatzisPam EnderbyMark HawleyMark Parker