A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Huu-Kim Nguyen; Kihyuk Jeong; Hong-Goo Kang

doi:10.1109/itc-cscc52171.2021.9501479

ScienceGate Book Chapters

JOURNAL ARTICLE

A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Huu-Kim Nguyen Kihyuk Jeong Hong-Goo Kang

Year: 2021 Journal: 2021 36th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC) Pages: 1-4

DOI: 10.1109/itc-cscc52171.2021.9501479

Get Full-Text PDF Get Analytical Report

Abstract

In this paper, we present a fast and lightweight speech synthesis model that is suitable for on-device applications. By leveraging the techniques of long-short range attention, depth-wise separable convolution, and linear attention, we significantly reduce the model size and complexity of the baseline FastSpeech2-based Transformer framework. Unlike the baseline model that requires O(N ² ) to compute attention and convolution operations because of nested-loop computations, our proposed model only requires O(N) computations due to the modification of a nested-loop into two cascaded single loops. Experimental results show that our proposed model is able to generate speech with a real-time factor of 0.26 and requires only 10.4 million parameters. Despite the reduction in model size and complexity, still, the generated speech quality of our model is nearly close to the baseline.

Keywords:

Computer science Convolution (computer science) Baseline (sea) Computation Reduction (mathematics) Separable space Nested loop join Transformer Speech synthesis Range (aeronautics) Algorithm Computational complexity theory Language model Speech recognition Parallel computing Artificial intelligence Mathematics Artificial neural network Engineering

Metrics

Cited By

0.12

FWCI (Field Weighted Citation Impact)

Refs

0.34

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

A Fast and Lightweight Speech Synthesis Model based on FastSpeech2

Abstract

Metrics

Citation History

Topics

Related Documents

FastSpeech2 Based Japanese Emotional Speech Synthesis

Research on Tibetan Speech Synthesis Based on Fastspeech2

AdaptiveFormer: A Few-shot Speaker Adaptive Speech Synthesis Model based on FastSpeech2

Mongolian Emotional Speech Synthesis Based on CGAN and Improved FastSpeech2

Tibetan Speech Synthesis Based on Pre-Traind Mixture Alignment FastSpeech2