JOURNAL ARTICLE

SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition

Abstract

With the recent advances in Automatic Speech Recognition systems, the lifestyle of normal people has become more convenient.However, for a population like the speech disordered community, the efficiency or the use of such ASR systems is very limited because these ASR systems are not trained or modelled with speech data pertaining to medically impaired people.The difficulty in training such ASR systems lies in the poor availability of data.To handle this issue, an approach like data augmentation for dysarthric speech recognition was analyzed in this paper.Noise is a source that is freely available in abundance.In speech recognition, noise has been used for developing a robust ASR system.This paper focuses on using noise as a source for data augmentation for increasing the number of dysarthric speech samples and improving the performance of speech recognition systems.The core idea behind this research work is that when a sound is combined or enhanced with another sound, its impact is noticeable only if both sounds have the same frequency range.Therefore, understanding the characteristics of each noise sample and adding them appropriately to the dysarthric speech data to create new samples of dysarthric speech data is the proposed method for increasing the number of dysarthric speech examples.Initially, noise samples were selected that do not affect the dysarthric speech frequency range.At a particular signal-to-noise ratio (SNR) the noise-augmented dysarthric speech examples were then used for training dysarthric speech recognition systems by employing hybrid DNN-HMM-based systems for isolated dysarthric speech examples.After noise selection-based data augmentation, it was observed that the word error rate (WER) was reduced by 7% for all the categories of dysarthric speakers in comparison with the WER for the ASR system trained without data augmentation.Since this approach used low-frequency noises as a source for data augmentation, the number of augmented examples was not restricted to a limit; the higher the number of low-frequency noises within a selective SNR range, the better the augmented examples.Further on, this approach used the selected dysarthric speech examples for augmentation, making the augmented examples not lose the dysarthric speakers' identities.

Keywords:
Computer science Speech recognition Selection (genetic algorithm) Natural language processing Artificial intelligence

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
18
Refs
0.18
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Voice and Speech Disorders
Health Sciences →  Medicine →  Physiology
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.