StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks

Hirokazu Kameoka; Takuhiro Kaneko; Kou Tanaka; Nobukatsu Hojo

doi:10.1109/slt.2018.8639535

ScienceGate Book Chapters

JOURNAL ARTICLE

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks

Hirokazu Kameoka Takuhiro Kaneko Kou Tanaka Nobukatsu Hojo

Year: 2018 Pages: 266-273

DOI: 10.1109/slt.2018.8639535

Get Full-Text PDF Get Analytical Report

Abstract

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN. Our method, which we call StarGAN-VC, is noteworthy in that it (1) requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate converted speech signals quickly enough to allow real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on variational autoencoding GANs.

Keywords:

Computer science Generator (circuit theory) Speech recognition Generative adversarial network Adversarial system Task (project management) Similarity (geometry) Generative grammar Identity (music) Quality (philosophy) Artificial neural network Artificial intelligence Deep learning Image (mathematics)

Metrics

356

Cited By

35.94

FWCI (Field Weighted Citation Impact)

Refs

1.00

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks

Abstract

Metrics

Citation History

Topics

Related Documents

Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks

Non-Parallel Many-to-Many Voice Conversion with PSR-StarGAN

A Survey on Generative Adversarial Networks based Models for Many-to-many Non-parallel Voice Conversion

High-Quality Many-to-Many Voice Conversion Using Transitive Star Generative Adversarial Networks with Adaptive Instance Normalization

Non-parallel Many-to-many Singing Voice Conversion by Adversarial Learning