JOURNAL ARTICLE

Singing Voice Synthesis Based on Generative Adversarial Networks

Abstract

This paper proposes a generative adversarial training method for deep neural network (DNN)-based singing voice synthesis. The DNN-based approach has been used in statistical parametric singing voice synthesis and improved the naturalness of the synthesized singing voice [1]. Recently, generative adversarial networks (GANs) [2] have attracted significant attention in various machine learning research areas including speech synthesis [3]. GANs have achieved great success in modeling the distributions of complex data, and they have the potential to alleviate over-smoothing problem on the generated speech parameters in speech synthesis. In this paper, we propose a DNN-based singing voice synthesis system incorporating the GAN. Experimental results show that the proposed method outperforms the conventional method in the naturalness of the synthesized singing voice.

Keywords:
Naturalness Singing Computer science Speech synthesis Speech recognition Parametric statistics Artificial neural network Generative grammar Artificial intelligence Acoustics Mathematics

Metrics

63
Cited By
6.45
FWCI (Field Weighted Citation Impact)
33
Refs
0.97
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.