An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan; Hamzah A. Alsayadi; Salah Al-Hagree

doi:10.32604/cmc.2023.033457

ScienceGate Book Chapters

JOURNAL ARTICLE

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed Hadwan Hamzah A. Alsayadi Salah Al-Hagree

Year: 2022 Journal: Computers, materials & continua/Computers, materials & continua (Print) Vol: 74 (2)Pages: 3471-3487

DOI: 10.32604/cmc.2023.033457

Get Full-Text PDF Get Analytical Report

Abstract

The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model. A Mel filter bank is used for feature extraction. To build a language model (LM), the Recurrent Neural Network (RNN) and Long short-term memory (LSTM) were used to train an n-gram word-based LM. As a part of this research, a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model, consisting of 10 h of .wav recitations performed by 60 reciters. The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate (CER) of 1.98% and a word error rate (WER) of 6.16%. We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.

Keywords:

Transformer End-to-end principle Computer science Language model Encoder Speech recognition Recurrent neural network Word error rate Artificial intelligence Deep learning Artificial neural network Natural language processing Voltage Engineering

Metrics

Cited By

4.31

FWCI (Field Weighted Citation Impact)

Refs

0.92

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Historical and Linguistic Studies

Social Sciences → Social Sciences → Sociology and Political Science

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Abstract

Metrics

Citation History

Topics

Related Documents

A Transformer-Based End-to-End Automatic Speech Recognition Algorithm

Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System

Transformer-Based Long-Context End-to-End Speech Recognition

Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition

A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition