JOURNAL ARTICLE

An End-to-End Transformer-Based Automatic Speech Recognition for Qur’an Reciters

Mohammed HadwanHamzah A. AlsayadiSalah Al-Hagree

Year: 2022 Journal:   Computers, materials & continua/Computers, materials & continua (Print) Vol: 74 (2)Pages: 3471-3487

Abstract

The attention-based encoder-decoder technique, known as the trans-former, is used to enhance the performance of end-to-end automatic speech recognition (ASR). This research focuses on applying ASR end-to-end transformer-based models for the Arabic language, as the researchers’ community pays little attention to it. The Muslims Holy Qur’an book is written using Arabic diacritized text. In this paper, an end-to-end transformer model to building a robust Qur’an vs. recognition is proposed. The acoustic model was built using the transformer-based model as deep learning by the PyTorch framework. A multi-head attention mechanism is utilized to represent the encoder and decoder in the acoustic model. A Mel filter bank is used for feature extraction. To build a language model (LM), the Recurrent Neural Network (RNN) and Long short-term memory (LSTM) were used to train an n-gram word-based LM. As a part of this research, a new dataset of Qur’an verses and their associated transcripts were collected and processed for training and evaluating the proposed model, consisting of 10 h of .wav recitations performed by 60 reciters. The experimental results showed that the proposed end-to-end transformer-based model achieved a significant low character error rate (CER) of 1.98% and a word error rate (WER) of 6.16%. We have achieved state-of-the-art end-to-end transformer-based recognition for Qur’an reciters.

Keywords:
Transformer End-to-end principle Computer science Language model Encoder Speech recognition Recurrent neural network Word error rate Artificial intelligence Deep learning Artificial neural network Natural language processing Voltage Engineering

Metrics

22
Cited By
4.31
FWCI (Field Weighted Citation Impact)
53
Refs
0.92
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Historical and Linguistic Studies
Social Sciences →  Social Sciences →  Sociology and Political Science
© 2026 ScienceGate Book Chapters — All rights reserved.