JOURNAL ARTICLE

A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Huu-Kim NguyenKihyuk JeongHong-Goo Kang

Year: 2021 Journal:   2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) Pages: 1-4

Abstract

In this paper, we present a fast and lightweight speech synthesis model that is suitable for on-device applications. By leveraging the techniques of long-short range attention, depth-wise separable convolution, and linear attention, we significantly reduce the model size and complexity of the baseline FastSpeech2-based Transformer framework. Unlike the baseline model that requires O(N 2 ) to compute attention and convolution operations because of nested-loop computations, our proposed model only requires O(N) computations due to the modification of a nested-loop into two cascaded single loops. Experimental results show that our proposed model is able to generate speech with a real-time factor of 0.26 and requires only 10.4 million parameters. Despite the reduction in model size and complexity, still, the generated speech quality of our model is nearly close to the baseline.

Keywords:
Computer science Convolution (computer science) Baseline (sea) Computation Reduction (mathematics) Separable space Nested loop join Transformer Speech synthesis Range (aeronautics) Algorithm Computational complexity theory Language model Speech recognition Parallel computing Artificial intelligence Mathematics Artificial neural network Engineering

Metrics

1
Cited By
0.12
FWCI (Field Weighted Citation Impact)
23
Refs
0.34
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing

Related Documents

JOURNAL ARTICLE

AdaptiveFormer: A Few-shot Speaker Adaptive Speech Synthesis Model based on FastSpeech2

Dengfeng KeRuixin HuQi LuoLiangjie HuangWenhan YaoWentao ShuJinsong ZhangYanlu Xie

Journal:   2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP) Year: 2022 Pages: 225-229
JOURNAL ARTICLE

Mongolian Emotional Speech Synthesis Based on CGAN and Improved FastSpeech2

Qing-Dao-Er-Ji RenYang YangLele Wang

Journal:   ACM Transactions on Asian and Low-Resource Language Information Processing Year: 2025 Vol: 24 (9)Pages: 1-16
JOURNAL ARTICLE

Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2

Qing ZhouXiaona XuYue Zhao

Journal:   Applied Sciences Year: 2024 Vol: 14 (15)Pages: 6834-6834
© 2026 ScienceGate Book Chapters — All rights reserved.