SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition

Sarkhell Sirwan Nawroly; Decebal POPESCU; Mariya Celin THEKEKARA ANTONY; Actlin Jeeva MUTHU PHILOMINAL

doi:10.24846/v32i4y202312

ScienceGate Book Chapters

JOURNAL ARTICLE

SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition

Sarkhell Sirwan Nawroly Decebal POPESCU Mariya Celin THEKEKARA ANTONY Actlin Jeeva MUTHU PHILOMINAL

Year: 2023 Journal: Studies in Informatics and Control Vol: 32 (4)Pages: 129-140

DOI: 10.24846/v32i4y202312

Get Full-Text PDF Get Analytical Report

Abstract

With the recent advances in Automatic Speech Recognition systems, the lifestyle of normal people has become more convenient.However, for a population like the speech disordered community, the efficiency or the use of such ASR systems is very limited because these ASR systems are not trained or modelled with speech data pertaining to medically impaired people.The difficulty in training such ASR systems lies in the poor availability of data.To handle this issue, an approach like data augmentation for dysarthric speech recognition was analyzed in this paper.Noise is a source that is freely available in abundance.In speech recognition, noise has been used for developing a robust ASR system.This paper focuses on using noise as a source for data augmentation for increasing the number of dysarthric speech samples and improving the performance of speech recognition systems.The core idea behind this research work is that when a sound is combined or enhanced with another sound, its impact is noticeable only if both sounds have the same frequency range.Therefore, understanding the characteristics of each noise sample and adding them appropriately to the dysarthric speech data to create new samples of dysarthric speech data is the proposed method for increasing the number of dysarthric speech examples.Initially, noise samples were selected that do not affect the dysarthric speech frequency range.At a particular signal-to-noise ratio (SNR) the noise-augmented dysarthric speech examples were then used for training dysarthric speech recognition systems by employing hybrid DNN-HMM-based systems for isolated dysarthric speech examples.After noise selection-based data augmentation, it was observed that the word error rate (WER) was reduced by 7% for all the categories of dysarthric speakers in comparison with the WER for the ASR system trained without data augmentation.Since this approach used low-frequency noises as a source for data augmentation, the number of augmented examples was not restricted to a limit; the higher the number of low-frequency noises within a selective SNR range, the better the augmented examples.Further on, this approach used the selected dysarthric speech examples for augmentation, making the augmented examples not lose the dysarthric speakers' identities.

Keywords:

Computer science Speech recognition Selection (genetic algorithm) Natural language processing Artificial intelligence

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.18

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Voice and Speech Disorders

Health Sciences → Medicine → Physiology

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

SNR-Selection-Based-Data Augmentation for Dysarthric Speech Recognition

Abstract

Metrics

Topics

Related Documents

Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition

Data Augmentation for Dysarthric Speech Recognition Based on Text-to-Speech Synthesis

Data Augmentation Techniques for Transfer Learning-Based Continuous Dysarthric Speech Recognition

Training Data Augmentation for Dysarthric Automatic Speech Recognition by Text-to-Dysarthric-Speech Synthesis

Pathology-Aware Speech Encoding and Data Augmentation for Dysarthric Speech Recognition