JOURNAL ARTICLE

Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2

Qing ZhouXiaona XuYue Zhao

Year: 2024 Journal:   Applied Sciences Vol: 14 (15)Pages: 6834-6834   Publisher: Multidisciplinary Digital Publishing Institute

Abstract

Most current research in Tibetan speech synthesis relies primarily on autoregressive models in deep learning. However, these models face challenges such as slow inference, skipped readings, and repetitions. To overcome these issues, we propose an enhanced non-autoregressive acoustic model combined with a vocoder for Tibetan speech synthesis. Specifically, we introduce the mixture alignment FastSpeech2 method to correct errors caused by hard alignment in the original FastSpeech2 method. This new method employs soft alignment at the level of Latin letters and hard alignment at the level of Tibetan characters, thereby improving alignment accuracy between text and speech and enhancing the naturalness and intelligibility of the synthesized speech. Additionally, we integrate pitch and energy information into the model, further enhancing overall synthesis quality. Furthermore, Tibetan has relatively smaller text-to-audio datasets compared to widely studied languages. To address these limited resources, we employ a transfer learning approach to pre-train the model with data from resource-rich languages. Subsequently, this pre-trained mixture alignment FastSpeech2 model is fine-tuned for Tibetan speech synthesis. Experimental results demonstrate that the mixture alignment FastSpeech2 model produces higher-quality speech compared to the original FastSpeech2 model, particularly when pre-trained on an English dataset, resulting in further improvements in clarity and naturalness.

Keywords:
Naturalness Computer science Speech synthesis Intelligibility (philosophy) Speech recognition Artificial intelligence Inference Autoregressive model Natural language processing Mathematics

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
15
Refs
0.11
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Mongolian Emotional Speech Synthesis Based on CGAN and Improved FastSpeech2

Qing-Dao-Er-Ji RenYang YangLele Wang

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2025 Vol: 24 (9)Pages: 1-16
JOURNAL ARTICLE

Research on Speech Synthesis Based on Mixture Alignment Mechanism

Yan DengNing WuChengjun QiuYan ChenXueshan Gao

Journal:   Sensors Year: 2023 Vol: 23 (16)Pages: 7283-7283
JOURNAL ARTICLE

A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Huu-Kim NguyenKihyuk JeongHong-Goo Kang

Journal:   2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) Year: 2021 Pages: 1-4
© 2026 ScienceGate Book Chapters — All rights reserved.