Dysarthric speech perception: Comparison of training effects on human listeners versus automatic speech recognition tools

Michael F. Lally; Heejin Kim; Lori Moon

doi:10.1121/1.5101567

ScienceGate Book Chapters

JOURNAL ARTICLE

Dysarthric speech perception: Comparison of training effects on human listeners versus automatic speech recognition tools

Michael F. Lally Heejin Kim Lori Moon

Year: 2019 Journal: The Journal of the Acoustical Society of America Vol: 145 (3_Supplement)Pages: 1795-1795 Publisher: Acoustical Society of America

DOI: 10.1121/1.5101567

Get Full-Text PDF Get Analytical Report

Abstract

Dysarthric speech, a motor speech disorder associated with neuro-motor conditions, poses a challenge for automatic speech recognition (ASR) due to its acoustic characteristics and the comparatively limited volume of available data. This study investigates ASR toolkits’ recognition abilities on speech from speakers affected with cerebral palsy (CP) at different intelligibility levels, by comparing them against human listeners’ performance. We ask: (1) how intelligible is speech from CP-affected speakers to ASR toolkits trained on non-dysarthric speech? (2) how does this performance compare to naïve human listeners? and (3) as familiarized human listeners better understand dysarthric speech, to what degree is dysarthric speech more intelligible to similarly familiarized ASR toolkits? Using the UA-Speech Database (Kim et al., 2008), we test the ASR system with two training methods: strong supervision, with both audio and orthography feedback in training, and unsupervised methods, with only audio signals, following Kim and Nanney’s (2014) experiments with human listeners. ASR accuracy is measured by the word error rate in word transcription tests. Findings reveal the extent to which supervision affects ASR models in comparison to human listeners. Implications regarding how to improve the adaptation techniques to dysarthric speech for both ASR and human listeners are presented.

Keywords:

Speech recognition Intelligibility (philosophy) Computer science Dysarthria Perception Psychology

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.05

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Phonetics and Phonology Research

Social Sciences → Psychology → Experimental and Cognitive Psychology

Voice and Speech Disorders

Health Sciences → Medicine → Physiology

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Dysarthric speech perception: Comparison of training effects on human listeners versus automatic speech recognition tools

Abstract

Metrics

Topics

Related Documents

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Survey of Automatic Dysarthric Speech Recognition

Prediction of Human Listeners' Speech Recognition Performance Based on Automatic Speech Recognition

A Survey of Automatic Speech Recognition for Dysarthric Speech

Automatic speech recognition with sparse training data for dysarthric speakers