Towards End-to-End Speech Recognition System for Pashto Language Using Transformer Model

Munazza Sher; Nasir Ahmad; Madiha Sher

doi:10.33411/ijist/202461115131

ScienceGate Book Chapters

JOURNAL ARTICLE

Towards End-to-End Speech Recognition System for Pashto Language Using Transformer Model

Munazza Sher Nasir Ahmad Madiha Sher

Year: 2024 Journal: International Journal of Innovations in Science and Technology Pages: 115-131

DOI: 10.33411/ijist/202461115131

Get Full-Text PDF Get Analytical Report

Abstract

The conventional use of Hidden Markov Models (HMMs), and Gaussian Mixture Models (GMMs)for speech recognition posed setup challenges and inefficiency. This paper adopts the Transformer model for Pashto continuous speech recognition, offering an End-to-End (E2E) system that directly represents acoustic signals in the label sequence, simplifying implementation. This study introduces a Transformer model leveraging its state-of-the-art capabilities, including parallelization and self-attention mechanisms. With limited data for Pashto, the Transformer is chosen for its proficiency in handling constraints. The objective is to develop an accurate Pashto speech recognition system. Through 200 hours of conversational data, the study achieves a Word Error Rate (WER) of up to 51% and a Character Error Rate (CER) of up to 29%. The model's parameters are fine-tuned, and the dataset size increased, leading to significant improvements. Results demonstrate the Transformer's effectiveness, showcasing its prowess in limited data scenarios. The study attains notable WER and CER metrics, affirming the model's ability to recognize Pashto speech accurately. In conclusion, the study establishes the Transformer as a robust choice for Pashto speech recognition, emphasizing its adaptability to limited data conditions. It fills a gap in ASR research for the Pashto language, contributing to the advancement of speech recognition technology in under-resourced languages. The study highlights the potential for further improvement with increased training data. The findings underscore the importance of fine-tuning and dataset augmentation in enhancing model performance and reducing error rates.

Keywords:

Hidden Markov model Transformer Word error rate Language model Acoustic model Training set Adaptability Mixture model

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.61

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Geochemistry and Geologic Mapping

Physical Sciences → Computer Science → Artificial Intelligence

Geological Modeling and Analysis

Physical Sciences → Earth and Planetary Sciences → Geochemistry and Petrology

Electrical and Electromagnetic Research

Physical Sciences → Physics and Astronomy → Atomic and Molecular Physics, and Optics

Towards End-to-End Speech Recognition System for Pashto Language Using Transformer Model

Abstract

Metrics

Topics

Related Documents

Towards Language-Universal End-to-End Speech Recognition

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

Towards End-to-End Speech Recognition

A study of transformer-based end-to-end speech recognition system for Kazakh language

A study of transformer-based end-to-end speech recognition system for Kazakh language