Kurdish end-to-end speech synthesis using deep neural networks

Sabat Salih Muhamad; Hadi Veisi; Aso Mahmudi; Abdulhady Abas Abdullah; Farhad Rahimi

doi:10.1016/j.nlp.2024.100096

ScienceGate Book Chapters

JOURNAL ARTICLE

Kurdish end-to-end speech synthesis using deep neural networks

Sabat Salih Muhamad Hadi Veisi Aso Mahmudi Abdulhady Abas Abdullah Farhad Rahimi

Year: 2024 Journal: Natural Language Processing Journal Vol: 8 Pages: 100096-100096 Publisher: Elsevier BV

DOI: 10.1016/j.nlp.2024.100096

Get Full-Text PDF Get Analytical Report

Abstract

This article introduces an end-to-end text-to-speech (TTS) system for the low-resourced language of Central Kurdish (CK, also known as Sorani) and tackles the challenges associated with limited data availability. We have compiled a dataset suitable for end-to-end text-to-speech that includes 21 h of CK female voice paired with corresponding texts. To identify the optimal performing system, we employed Tacotron2, an end-to-end deep neural network for speech synthesis, in three training experiments. The process involves training Tacotron2 using a pre-trained English system, followed by training two models from scratch with full and intonationally balanced datasets. We evaluated the effectiveness of these models using Mean Opinion Score (MOS), a subjective evaluation metric. Our findings demonstrate that the model trained from scratch on the full CK dataset surpasses both the model trained with the intonationally balanced dataset and the model trained using a pre-trained English model in terms of naturalness and intelligibility by achieving a MOS of 4.78 out of 5.

Keywords:

End-to-end principle Naturalness Computer science Mean opinion score Artificial neural network Speech synthesis Scratch Deep neural networks Speech recognition Intelligibility (philosophy) Metric (unit) Artificial intelligence Language model Pronunciation Natural language processing Linguistics Engineering

Metrics

Cited By

1.92

FWCI (Field Weighted Citation Impact)

Refs

0.83

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Kurdish end-to-end speech synthesis using deep neural networks

Abstract

Metrics

Citation History

Topics

Related Documents

End-to-End Speech Emotion Recognition Using Deep Neural Networks

Deep Neural Networks for End-to-End Optimized Speech Coding

Towards an end-to-end speech recognizer for Portuguese using deep neural networks

End-to-End Kurdish Speech Synthesis Based on Transfer Learning

End-to-End Kurdish Speech Synthesis Based on Transfer Learning