Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis

Joun Yeop Lee; Sung Jun Cheon; Byoung Jin Choi; Nam Soo Kim; Doo Hwa Hong

doi:10.7840/kics.2019.44.9.1641

ScienceGate Book Chapters

JOURNAL ARTICLE

Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis

Joun Yeop Lee Sung Jun Cheon Byoung Jin Choi Nam Soo Kim Doo Hwa Hong

Year: 2019 Journal: The Journal of Korean Institute of Communications and Information Sciences Vol: 44 (9)Pages: 1641-1647 Publisher: THE KOREAN INSTITUTE OF COMMUNICATIONS AND INFORMATION SCIENCES (KICS)

DOI: 10.7840/kics.2019.44.9.1641

Get Full-Text PDF Get Analytical Report

Abstract

본 논문에서는 mutual information(MI)를 사용하여 스타일 end-to-end 음성 합성에서 스타일에 텍스트 정보를 없애는 기법을 제안한다. MI을 딥 러닝 환경에서 구현하기 위하여 mutual information neural estimator(MINE)을 활용하였으며 이를 통해 텍스트 정보가 분리된 스타일을 추출하여 음성 합성에 사용할 수 있을 것이다. 제안하는 기법은 VCTK 데이터베이스를 활용하여 실험되었으며 실험 결과 기존의 방식은 Tacotron Global Style Token 기법에 비해 높은 성능을 보임을 확인할 수 있었다.

Keywords:

Mutual information End-to-end principle Estimator Speech recognition Computer science Style (visual arts) Artificial intelligence Mathematics Geography Statistics

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech Style Modeling Method Using Mutual Information for End-to-End Speech Synthesis

Abstract

Metrics

Topics

Related Documents

Improving Unsupervised Style Transfer in end-to-end Speech Synthesis with end-to-end Speech Recognition

Fast Inference End-to-End Speech Synthesis with Style Diffusion

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis

Mutual-Learning Improves End-to-End Speech Translation

End-to-End Binaural Speech Synthesis